Method and apparatus for processing natural language using auto-intersection

ABSTRACT

Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.

CROSS-REFERENCE TO RELATED APPLICATIONS

Priority is claimed from U.S. Provisional Application No. 60/481,639,filed Nov. 14, 2003, entitled “Method And Apparatus For ProcessingNatural Language Using Multi-Tape Automata”, by the same inventors andassignee, which is hereby incorporated herein by reference. In addition,cross-reference is made to the following U.S. patent application that isconcurrently filed with this patent application, assigned to the sameassignee as this patent application, incorporated in this patentapplication by reference, and claims priority to U.S. Provisional PatentApplication Ser. No. 60/481,639, filed Nov. 14, 2003: U.S. patentapplication Ser. No. __/___,___ (File No. A3354-US-NP), entitled “MethodAnd Apparatus For Processing Natural Language Using Tape-Intersection”.

BACKGROUND AND SUMMARY

The present invention relates to a method an apparatus for processingnatural language using operations performed on weighted and non-weightedmulti-tape automata.

Finite state automata (FSAs) are mathematically well defined and offermany practical advantages. They allow for fast processing of input dataand are easily modifiable and combinable by well defined operations.Consequently, FSAs are widely used in Natural Language Processing (NLP)as well as many other fields. A general discussion of FSAs is describedin Patent Application Publication US 2003/0004705 A1 and in “FiniteState Morphology” by Beesley and Karttunen (CSLI Publications, 2003),which are incorporated herein by reference.

Weighted finite state automata (WFSAs) combine the advantages ofordinary FSAs with the advantages of statistical models, such as HiddenMarkov Models (HMMs), and hence have a potentially wider scope ofapplication than FSAs. Weighted multi-tape automata (WMTAs) have yetmore advantages. For example, WMTAs permit the separation of differenttypes of information used in NLP (e.g., surface word form, lemma,POS-tag, domain-specific information) over different tapes, and preserveintermediate results of different steps of NLP on different tapes.Operations on WMTAs may be specified to operate on one, several, or alltapes.

While some basic WMTAs operations, such as union, concatenation,projection, and complementary projection, have been defined for asub-class of non-weighted multi-tape automata (see for example thepublication by Kaplan and Kay, “Regular models of phonological rulesystems”, in Computational Linguistics, 20(3):331-378, 1994) andimplemented (see for example the publication by Kiraz and Grimley-Evans,“Multi-tape automata for speech and language systems: A prologimplementation”, in D. Woods and S. Yu, editors, AutomataImplementation, number 1436 in Lecture Notes in Computer Science,Springer Verlag, Berlin, Germany, 1998), there continues to be a needfor improved, simplified, and more efficient operations for processingWMTAs to make use of these advantages in natural language processing.

In accordance with the invention, there is provided a method andapparatus for using weighted multi-tape automata (WMTAs) in naturallanguage processing (NLP) that includes morphological analysis,part-of-speech (POS) tagging, disambiguation, and entity extraction. Inperforming NLP, operations are employed that perform cross-product,auto-intersection, and tape-intersection (i.e., single-tape intersectionand multi-tape intersection) of automata. Such operations may beperformed using transition-wise processing on weighted or non-weightedmulti-tape automata.

In accordance with one aspect of the invention there is provided in asystem for processing natural language, a method for intersecting afirst tape and a second tape of a multi-tape automaton (MTA) that has aplurality of n tapes and a plurality of paths. The method includesgenerating a string tuple <s₁, . . . , s_(n)> having a string s for eachof the n tapes of each path of the MTA and comparing the string s_(j) ofthe first tape with the string s_(k) of the second tape in the stringtuple. If the strings s_(j) and s_(k) equal, the string tuple isretained in the MTA. However, if the strings s_(j) and s_(k) do notequal, the MTA is restructured to remove the string tuple.

In accordance with another aspect of the invention, there is provided ina system for processing natural language, a method for intersecting afirst tape and a second tape of an input multi-tape automaton (MTA) thathas a plurality of tapes and a plurality of paths. The method includescomputing a first limit and a second limit of the input MTA. An outputMTA is constructed that intersects the first tape and-the second tapeusing the second limit to delimit its construction. Transitions areremoved along paths in the output MTA during construction except forthose transitions of paths having similar labels on the first selectedtape and the second selected tape. The constructed output MTA isdetermined to be regular using the first limit. If the output MTA isdetermined to be regular, then the output MTA is provided as a completesolution to the intersection of the first tape and the second tape ofthe input MTA. If the output MTA is determined not to be regular, thenthe output MTA is provided as a partial solution to the intersection ofthe first tape and the second tape of the input MTA.

It will be appreciated that the present invention has the followingadvantages over weighted 1-tape or 2-tape processing of automata becauseit allows for: (a) the separation of different types of information usedin NLP (e.g., surface form, lemma, POS-tag, domain-specific information,etc.) over different tapes; (b) the preservation of some or allintermediate results of various NLP steps on different tapes; and (c)the possibility of defining and implementing contextual replace rulesreferring to different types of information on different tapes.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention will become apparent from thefollowing description read in conjunction with the accompanying drawingswherein the same reference numerals have been applied to like parts andin which:

FIG. 1 is a flow diagram that sets forth steps for performing anauto-intersection operation;

FIG. 2 presents two weighted three-tape automata for illustrating anexample of the auto-intersection operation;

FIG. 3 is a flow diagram that sets forth steps for performing asingle-tape intersection operation of a first WMTA and a second WMTA;

FIG. 4 presents two WMTAs for illustrating a simple example of thesingle-tape intersection operation;

FIG. 5 sets forth a method in pseudocode for performing a cross-productoperation in an embodiment with path alignment;

FIG. 6 sets forth a first method in pseudocode for performing anauto-intersection operation;

FIG. 7 presents two automata for illustrating the method for performingthe auto-intersection operation set forth in FIG. 6;

FIG. 8 presents two automata for illustrating an example of the methodfor performing the auto-intersection operation set forth in FIG. 6 thatfails to perform auto-intersection;

FIG. 9 sets forth a second method in pseudocode for performing anauto-intersection operation;

FIGS. 10 and 11 each present two automata for illustrating the methodfor performing the auto-intersection operation set forth in FIG. 9 whichresults in a WMTA A^((n)) that is regular;

FIG. 12 presents two automata for illustrating an example of the methodfor performing the auto-intersection operation set forth in FIG. 9 whichresults in a WMTA A^((n)) that is not regular;

FIG. 13 sets forth a first method in pseudocode for performing asingle-tape intersection operation;

FIG. 14 presents Mohri's epsilon-filter A_(ε) and two automata A₁ andA₂;

FIG. 15 sets forth a method in pseudocode of a second embodiment forperforming the single-tape intersection operation;

FIG. 16 presents an automaton for illustrating an operation forpart-of-speech (POS) disambiguation and its use in natural languageprocessing;

FIG. 17 illustrates one path of the automaton shown in FIG. 16;

FIG. 18 illustrates the intersection of an arc with a path of a lexiconautomaton;

FIG. 19 illustrates the intersection of a path of a sentence automatonwith a path of an HMM automaton;

FIG. 20 illustrates an example of a (classical) weighted transductioncascade; and

FIG. 21 illustrates an example of a weighted transduction cascade usingmulti-tape intersection;

FIG. 22 illustrates a general purpose computer system for carrying outnatural language processing in accordance with the present invention.

DETAILED DESCRIPTION

Outline of Detailed Description

-   -   A. Definitions        -   A.1 Semirings        -   A.2 Weighted Automata        -   A.3 Weighted Multi-Tape Automata    -   B. Operations On Multi-Tape Automata        -   B.1 Pairing and Concatenation        -   B.2 Projection and Complementary Projection        -   B.3 Cross-Product        -   B.4 Auto-Intersection        -   B.5 Single-Tape Intersection        -   B.6 Multi-Tape Intersection        -   B.7 Transition Automata and Transition-Wise Processing    -   C. Methods For Performing MTA Operations        -   C.1 Cross-Product            -   C.1.1 Conditions            -   C.1.2 Path Concatenation Method            -   C.1.3 Path Alignment Method            -   C.1.4 Complexity        -   C.2 Auto-Intersection            -   C.2.1 Conditions Of First Method            -   C.2.2 First Method            -   C.2.3 Example Of First Method            -   C.2.4 Conditions Of Second Method            -   C.2.5 Second Method                -   C.2.5.A Compile Limits                -   C.2.5.B Construct Auto-Intersection                -   C.2.5.C Test Regularity            -   C.2.6 Examples Of Second Method        -   C.3 Single-Tape Intersection            -   C.3.1 Conditions            -   C.3.2 First Embodiment            -   C.3.3 Mohri's Epsilon-Filter            -   C.3.4 Second Embodiment            -   C.3.5 Complexity        -   C.4 Multi-Tape Intersection            -   C.4.1 Conditions            -   C.4.2 Embodiments            -   C.4.3 Example    -   D. Applications        -   D.1 General Use        -   D.2 Building A Lexicon From A Corpus        -   D.3 Enhancing A Lexicon With Lemmas        -   D.4 Normalizing A Lexicon        -   D.5 Using A Lexicon        -   D.6 Searching For Similarities        -   D.7 Preserving Intermediate Transduction Results        -   D.8 Example System    -   E. Miscellaneous

A. Definitions

This section recites basic definitions of algebraic structures that areused in describing the present invention, such as for “monoid” and“semiring” and “weighted automaton” (which are described in more detailin the following publications, which are incorporated herein byreference, by: Eilenberg, “Automata, Languages, and Machines”, volume A,Academic Press, San Diego, Calif., USA, 1974; and Kuich and Salomaa,“Semirings, Automata, Languages”, Number 5 in EATCS Monographs onTheoretical Computer Science, Springer Verlag, Berlin, Germany, 1986),and for weighted multitape automaton, based on the definitions ofmulti-tape automaton (which are described in more detail in thefollowing publication, which is incorporated herein by reference, by:Elgot and Mezei, “On relations defined by generalized finite automata”,IBM Journal of Research and Development, 9:47-68, 1965).

A.1 Semirings

A monoid consists of a set M, an associative binary operation “o” on M,and a neutral element {overscore (1)} such that {overscore (1)} o a=a o{overscore (1)}=a for all a ε M. A monoid is called commutative iff a ob=b o a for all a, b ε M.

The set K with two binary operations ⊕ (collection) and {circle over(×)} (extension) and two elements {overscore (0)} and {overscore (1)} iscalled a semiring, if it satisfies the following properties:

-   -   (a) <K, ⊕, {overscore (0)}> is a commutative monoid;    -   (b) <K, {circle over (×)}, {overscore (1)}> is a monoid;    -   (c) {circle over (×)} is left-distributive and        right-distributive over ⊕:        a {circle over (×)}(b⊕c)=(a{circle over (×)}b)⊕(a{circle over        (×)}c), (a⊕b){circle over (×)}c=(a{circle over (×)}c)⊕(b{circle        over (×)}c), ∀a, b, c εK;    -   (d) {overscore (0)} is an annihilator for {circle over (×)}:        {overscore (0)}{circle over (×)}a=a{circle over (×)}{overscore        (0)}={overscore (0)}, ∀aεK.

A generic semiring K is denoted as <K, ⊕,{circle over (×)}, {overscore(0)}, {overscore (1)}>.

Some methods for processing automata require semirings to have specificproperties. Composition, for example, requires a semiring to becommutative (which is described in more detail in the followingpublications, incorporated herein by reference, by: Pereira and Riley,“Speech recognition by composition of weighted finite automata”, inEmmanuel Roche and Yves Schabes, editors, Finite-State LanguageProcessing, MIT Press, Cambridge, Mass., USA, pages 431-453, 1997; andMohri, Pereira, and Riley, “A rational design for a weightedfinite-state transducer library”, Lecture Notes in Computer Science,1436:144-158, 1998), and ε (i.e., epsilon) removal requires it to bek-closed (which is described in more detail in the following publicationincorporated herein by reference by: Mohri, “Generic epsilon-removal andinput epsilon-normalization algorithms for weighted transducers”,International Journal of Foundations of Computer Science, 13(1):129-143,2002). These properties are defined as follows:

-   -   (a) commutativity: a{circle over (×)}b=b{circle over (×)}a, ∀a,        b εK; and        ${{{(b)\quad k\text{-}{closedness}\text{:}}\quad\underset{n = 0}{\overset{k + 1}{\oplus}}a^{n}} = \quad{\underset{n = 0}{\overset{k}{\oplus}}a^{n}}},{\forall{a \in {K.}}}$

The following well-known examples are all commutative semirings:

-   -   (a) <IB,+,×, 0, 1>: boolean semiring, with IB={0, 1} and 1+1=1;    -   (b) <IN,+,×, 0, 1>: integer semiring with the usual addition and        multiplication;    -   (c) <IR⁺,+,×, 0, 1>: real positive sum times semiring;    -   (d) <{overscore (IR)}⁺, min,+,∞, 0>: a real tropical semiring        where {overscore (IR)}⁺ denotes {overscore (IR)}⁺ ∪{∞}.

A number of methods for processing automata require semirings to beequipped with an order or partial order denoted by <K. Each idempotentsemiring K (i.e., ∀aεK: a⊕a=a) has a natural partial order defined by a<K b

a⊕b=a. In the above examples, the boolean and the real tropical semiringare idempotent, and hence have a natural partial order.

A.2 Weighted Automata

A weighted automaton A over a semiring K is defined as a six-tuple:A= _(def) <Σ, Q, I, F, E, K>

with: Σ being a finite alphabet Q the finite set of states I

Q the set of initial states F

Q the set of final states E

Q × (Σ∪{ε}) × Q the finite set of transitions and κ = <K, {overscore(0)}, {overscore (1)}, ⊕,

> the semiring

For any state qεQ, there is defined: λ(q) λ : I → κ the initial weightfunction (with λ(q) = {overscore (0)} , ∀q∉I)

(q)

: F → κ the final weight function (with

(q) = {overscore (0)} , ∀q∉F) E(q) = {e | p(e) = q} the finite set ofout-going transitions

and for any transition eεE, e=<p, l, w,n>, there is defined: p(e) p : E→ Q the source state l(e) l : E → Σ∪{ε} the label (with ε being theempty string) w(e) w : E → κ the weight (with w(e) ≠ {overscore (0)},∀e∈E) n(e) n : E → Q the target state

A path π of length r=|π| is a sequence of transitions e₁e₂ . . . e_(r)such that n(e_(i))=p(e_(i)+1) for all iε[[1,r−1]]. A path is said to besuccessful iff p(e₁)εI and n(e_(r))εF. In the following description onlysuccessful paths are considered. The label, {circumflex over (l)}(π), ofany successful path π equals the concatenation of the labels of itstransitions:{circumflex over (l)}(π)=l(e ₁)l(e ₂) . . . l(e _(r))and its “weight” or “accepting weight” w(π) is:${w(\pi)} = {{\lambda\left( {p\left( e_{1} \right)} \right)} \otimes \left( {\underset{j = {〚{1,l}〛}}{\otimes}{w\left( e_{j} \right)}} \right) \otimes {\varrho\left( {n\left( e_{r} \right)} \right)}}$

Π(s) denotes the (possibly infinite) set of successful paths of A andΠ(s) denotes the (possibly infinite) set of successful paths for thestring s:Π(s)={π|∀π ε Π(A), s={circumflex over (l)}(π)}

(A) is defined as the language of A. It is the (possibly infinite) setof strings s having successful paths in A:

(A)={{circumflex over (l)}(π)|π ε Π(A)}

The accepting weight for any string sε

(A) is defined by:${w(s)} = {\underset{\pi \in {\prod{(s)}}}{\oplus}{w(\pi)}}$

A.3 Weighted Multi-Tape Automata

In analogy to a weighted automaton, a weighted multi-tape automaton(WMTA), also called weighted n-tape automaton, over a semiring K isdefined as a six-tuple:A ^((n))=_(def) <Σ, Q, I, F, E ^((n)) , K>

with Σ, Q, I, F, and K being defined as above in section A.2, and with:E^((n))

Q × (Σ∪{ε})^(n) × Q being the finite set of n-tape transitions n and thearity, i.e., the number of tapes in A

Most of the definitions given for weighted automata are also valid forWMTAs, except that each transition e^((n)) ε E^((n)) is labeled with:l(e^((n))) l : E^((n)) → (Σ∪{e})^(n) an n-tuple of symbols

If all symbols σ ε(Σ∪{ε}) of a tuple are equal, the short-hand notationσ^((n)) may be used on the terminal symbol. For example:a ⁽³⁾ =<a, a, a>ε⁽²⁾=<ε, ε>

The label of a successful path π^((n)) of length r=|π^((n))| equals theconcatenation of the labels of its transitions, all of which must havethe same arity n:{circumflex over (l)}(π^((n)))=l(e ₁ ^((n)))l(e ₂ ^((n))) . . . l(e _(r)^((n)))which is an n-tuple of strings:{circumflex over (l)}(π^((n)))=s ^((n)) =<s ₁ , s ₂ , . . . , s _(n)>where each string s_(j) is a concatenation of the j-th element of alll(e_(i) ^((n))) of π^((n)) (with iε[[1,r]]). In anticipation of theprojection operation, P_(j)( ), (defined in section B.2) this can beexpressed as:s _(j) =P _(j)({circumflex over (l)}(π^((n))))=P _(j)(l(e ₁ ^((n)))) P_(j)(l(e ₂ ^((n)))) . . . P _(j)(l(e _(r) ^((n))))

The symbols on e^((n)) are not “bound” to each other. For example, thestring triple s⁽³⁾=<aaa, bb, ccc> can be encoded, among others, by anyof the following sequences of transitions: (a:b:c)(a:b:c)(a:ε:c) or(a:b:c)(a:ε:c)(a:b:c) or (ε:ε:c)(a:b:c)(a:b:c)(a:ε:ε), etc.

Π(s) denotes the (possibly infinite) set of successful paths for then-tuple of strings s^((n)):Π(s ^((n)))={π^((n))|∀π^((n)) εΠ(A ^((n))), s ^((n)) =l(π^((n)))}

(A^((n))) is called the n-tape language of A^((n)) (which may also bereferred to as a relation of arity(n)). It is the (possibly infinite)set of n-tuples of strings s^((n)) having successful paths in A^((n)):

(A ^((n)))={l ^((n)) |l ^((n)) =l(π^((n))), ∀π^((n)) εΠ(A ^((n)))}

The accepting weight for any n-tuple of strings s^((n))ε

(A^((n))) is defined by:${w\left( s^{(n)} \right)} = {\underset{\pi^{(n)} \in {\prod{(s^{(n)})}}}{\oplus}{w\left( \pi^{(n)} \right)}}$

It is possible to define an arbitrary weighted relation between thedifferent tapes of

(A^((n))). For example,

(A⁽²⁾) of the weighted transducer (i.e., two-tape automata) A⁽²⁾ isusually considered as a weighted relation between its two tapes, P₁(

(A⁽²⁾)) and P₂(

(A⁽²⁾)). One of the two tapes is considered to be the input tape, theother as the output tape.

B. Operations on Multi-Tape Automata

All operations defined in this section are defined on symbol tuples,string tuples, or n-tape languages, taking their accepting weights intoaccount. Whenever these operations are used on transitions, paths, orautomata, they are actually applied to their labels or languagesrespectively. For example, the binary operation ö on two automata, A₁^((n)) öA₂ ^((n)), actually means

(A₁ ^((n)) öA₂ ^((n)))=

(A₁ ^((n)))ö

(A₂ ^((n))), and the unary operation {dot over (o)} on one automaton,{dot over (o)} A^((n)), actually means

({dot over (o)} A^((n)))={dot over (o)}

(A^((n))).

B.1 Pairing and Concatenation

The pairing of two string tuples, s^((n)): v^((m))=u^((n+m)), and itsaccepting weight is defined as:<s ₁ , . . . , s _(n) >:<v ₁ , . . . , v _(m)>=_(def) <s ₁ , . . . , s_(n) , v ₁ , . . . , v _(m)>w(<s ₁ , . . . , s _(n) >:<v ₁ , . . . , v _(m)>)=_(def) w(<s ₁ , . . ., s _(n)>){circle over (×)}w(<v ₁ , . . . , v _(m)>)

1-tuples of strings are not distinguished herein from strings, andhence, instead of writing s⁽¹⁾:v⁽¹⁾ or <s>:<v>, s:v is simply written.If strings contain only one symbol σε(Σ∪ε), they are not distinguishedfrom the strings and their only symbol, and instead the pairing σ₁:σ₂ iswritten.

Pairing is associative:s ₁ ^((n) ¹ ⁾ :s ₂ ^((n) ² ⁾ :s ₃ ^((n) ³ ⁾=(s ₁ ^((n) ¹ ⁾ :s ₂ ^((n) ²⁾ ):s ₃ ^((n) ³ ⁾ =s ₁ ^((n) ¹ ⁾:(s ₂ ^((n) ² ⁾ :s ₃ ^((n) ³ ⁾)=s ^((n)¹ ^(+n) ² ^(+n) ³ ⁾

The concatenation of two sting tuples of equal arity, s^((n))v^((n))=u^((n)), and its accepting weight are defined as:<s ₁ , . . . , s _(n) ><v ₁ , . . . , v _(n)>=_(def) <s ₁ v ₁ , . . . ,s _(n) v _(n)>w(<s ₁ , . . . , s _(n) ><v ₁ , . . . , v _(n)>)=_(def) w(<s ₁ , . . . ,s _(n)>){circle over (×)}w(<v ₁ , . . . , v _(n)>)

Again, 1-tuples of strings are not distinguished herein from strings,and hence, instead of writing s⁽¹⁾v⁽¹⁾ or <s><v>, sv is simply written.If strings contain only one symbol σε(Σ∪ε), they are not distinguishedfrom the strings and their only symbol, and instead the concatenationσ₁σ₂ is written.

Concatenation is associative:s ₁ ^((n)) s ₂ ^((n)) s ₃ ^((n))=(s ₁ ^((n)) s ₂ ^((n)))s ₃ ^((n)) =s ₁^((n))(s ₂ ^((n)) s ₃ ^((n)))=s ^((n))

The relation between pairing and concatenation can be expressed througha matrix of string tuples s_(jk) ^((n) ^(j) ⁾ given by: $\begin{bmatrix}s_{11}^{(n_{1})} & \ldots & s_{1r}^{(n_{1})} \\\vdots & \quad & \vdots \\s_{m1}^{(n_{m})} & \ldots & s_{mr}^{(n_{m})}\end{bmatrix}\quad$that are horizontally concatenated and vertically paired:$\begin{matrix}{s^{({n_{1} + \ldots + n_{m}})} = \left( {\begin{matrix}s_{11}^{(n_{1})} & \ldots & {\left. s_{1r}^{(n_{1})} \right)\begin{matrix}\vdots & \ldots & \vdots\end{matrix}}\end{matrix}\left( \begin{matrix}s_{m1}^{(n_{m})} & \ldots & \left. s_{mr}^{(n_{m})} \right)\end{matrix} \right.} \right.} \\{= \left( \begin{matrix}s_{11}^{(n_{1})} & \vdots & \ldots & \vdots & {\left. s_{m1}^{(n_{m})} \right)\begin{matrix}{\quad{\ldots\quad\left( \begin{matrix}s_{1r}^{(n_{1})} & \vdots & \ldots & \vdots & \left. s_{mr}^{(n_{m})} \right)\end{matrix} \right.}}\end{matrix}}\end{matrix} \right.}\end{matrix}$where the equation above does not hold for the accepting weights unlessthey are defined over a commutative semiring K.

B.2 Projection and Complementary Projection

A projection P_(j,k), . . . (s^((n))) retains only those strings (i.e.,tapes) of the tuple s^((n)) that are specified by the indices j, k, . .. ε[[(1,n]], and places them in the specified order. The projection andits accepting weight are defined as:P _(j,k, . . .) (<s ₁ , . . . , s _(n)>)=_(def) <s _(j) , s _(k), . . .>w(P _(j,k, . . .) (<s ₁ , . . . , s _(n)>))=_(def) w(<s ₁ , . . . , s_(n)>)where the weights are not modified by the projection. Projection indicescan occur in any order and more than once. Thus, the tapes of s^((n))can, for example, be reversed or duplicated:P _(n, . . . ,) 1(<s ₁ , . . . , s _(n)>)=<s _(n) , . . . , s ₁>P _(j,j,j)(<s ₁ , . . . , s _(n)>)=<s _(j) , s _(j) , s _(j)>

The relation between projection and pairing, and between theirrespective accepting weights is: $\quad{s^{(n)} = \begin{matrix}{\mathcal{P}_{1}\left( s^{(n)} \right)} & \vdots & \ldots & \vdots & {\mathcal{P}_{n}\left( s^{(n)} \right)}\end{matrix}}$ ${w\left( s^{(n)} \right)} \neq \begin{matrix}{w\left( {\mathcal{P}_{1}\left( s^{(n)} \right)} \right.} & \vdots & \ldots & \vdots & {\left. {\mathcal{P}_{n}\left( s^{(n)} \right)} \right) = \underset{\underset{n\quad{times}}{︸}}{{w\left( s^{(n)} \right)} \otimes \ldots \otimes {w\left( s^{(n)} \right)}}}\end{matrix}$

A complementary projection P_(j,k, . . .) (s^((n))) removes thosestrings (i.e., tapes) of the tuple s^((n)) that are specified by theindices j,k, . . . ε[[1,n]], and preserves all other strings in theiroriginal order. Complementary projection and its accepting weight aredefined as:P _(j,k), . . . (<s ₁ , . . . , s _(n)>)=_(def) < . . . , s _(j−1) , s_(j+1) , . . . , s _(k−1) , s _(k+1), . . . >w({overscore (P)} _(j,k), . . . (<s ₁ , . . . , s _(n)>))=_(def) w(<s ₁, . . . , s _(n)>)

Complimentary projection indices can occur only once, but in any order.

The projection of an n-tape language is the projection of all its stringtuples and complimentary projection of an n-tape language is defined inthe same way, respectively as:P _(j,k), . . . (

^((n)))={P _(j,k), . . . (s ^((n)))|∀s ^((n)) ε

^((n))}{overscore (P)} _(j,k), . . . (

^((n)))={{overscore (P)} _(,j,k), . . . (s ^((n)))|∀s ^((n)) ε

^((n))}

B.3 Cross-Product

The cross-product of two n-tape languages is defined as:

₁ ^((n))×

₂ ^((m))=_(def) {s ^((n)) : v ^((m)) |∀s ^((n))ε

₁ ^((n)) , ∀v ^((m))ε

₂ ^((m))}

The accepting weight of each string tuple in

₁ ^((n))×

₂ ^((m)) follows from the definition of pairing. The cross productoperation is associative.

A well known example (and special case) is the cross-product of twoacceptors (i.e., a 1-tape automata) leading to a transducer (i.e., a2-tape automaton):A ⁽²⁾ =A ₁ ⁽¹⁾ ×A ₂ ⁽¹⁾

(A ₁ ⁽¹⁾ ×A ₂ ⁽¹⁾)={s:v|∀s ε

(A ₁ ⁽¹⁾), ∀v ε

(A ₂ ⁽¹⁾)}w(s:v)=w _(A) ₁ (s){circle over (×)}w _(A) ₂ (v)

B.4 Auto-Intersection

Operations for auto-intersection are described in this section. Morespecifically, this section describes operations for performingauto-intersection on string tuples and languages.

The auto-intersection I_(j,k)(s^((n))) on string tuples succeeds, if thetwo strings s_(j) and s_(k) of the tuple s^((n)) =<s ₁, . . . , s_(n)>are equal (with j,k ε [[1,n]]), and fails otherwise (⊥).Auto-intersection and its accepting weight are defined as:$\begin{matrix}{\mathcal{I}_{j,k}\left( s^{(n)} \right)} & =_{def} & \left\{ \begin{matrix}s^{(n)} & {\quad{{{for}\quad s_{j}} = s_{k}}} \\\bot & {\quad{{{for}\quad s_{j}} \neq s_{k}}}\end{matrix} \right. \\{w\left( {\mathcal{I}_{j,k}\left( s^{(n)} \right)} \right)} & =_{def} & \left\{ \begin{matrix}{w\left( s^{(n)} \right)} & {{{for}\quad s_{j}} = s_{k}} \\\overset{\_}{0} & {{{for}\quad s_{j}} \neq s_{k}}\end{matrix} \right.\end{matrix}\quad$

This means the weight of a successfully auto-intersected string tuple isnot modified, whereas the weight of a string tuple where theauto-intersection failed is {overscore (0)}, which corresponds to theinvalidation or elimination of that tuple.

FIG. 1 is a flow diagram that sets forth steps for performing theauto-intersection operation on a first tape and a second tape of a pathof a weighted multi-tape automaton. At 102, a string tuple <s₁, . . . ,s_(n)> is generated that has a string s for each of the n tapes of aselected path of the WMTA. At 104, the string s_(j) of the first tape iscompared with the string s_(k) of the second tape in the string tuple.If at 106, the strings s_(j) and s_(k) are equal, then the string tupleis retained in the WMTA at 108; otherwise, the WMTA is restructured toremove the string tuple at 110. At 112, if the last of the paths of theWMTA has been processed, then auto-intersection completes; otherwise, itcontinues with the next selected path of the WMTA at 102.

For example, the result of performing the auto-intersection operationI_(j,k)(s^((n))) on tapes 1 and 3 of the example three-tape WMTA (i.e.,s⁽³⁾) shown FIG. 2 fails for the first of its two paths, producing afirst string tuple <ax, by, az> (with weight w₁w₂), but succeeds for thesecond of its two paths, producing a second string tuple <ax, by, ax>(with weight w₁w₃). The final state of the WMTA has weight w₄. Becausethe strings of tape 1 (i.e., “ax”) and tape 3 (i.e., “az”) of the firststring tuple <ax, by, az> are not equal unlike the strings of tape 1(i.e., “ax”) and tape 3 (i.e., “ax”) of the second string tuple <ax, by,ax>, the WMTA shown in FIG. 2 is restructured to remove the arc x:y:x/w₂of the first path because there are no other successful paths in theWMTA that depend on that arc (i.e., I_((1,3))(s⁽³⁾)). The result of theauto-intersection of the two selected tapes 1 and 3 of the WMTA is tofilter out all string tuples of the WMTA except for those that havestrings that are equal on two selected tapes.

More generally, auto-intersection of a language, I_(j,k)(

^((n))), equals the auto-intersection of all of its string tuples suchthat only the successfully auto-intersected string tuples are retained,which may be defined as: $\begin{matrix}{\mathcal{I}_{j,k}\left( \mathcal{L}^{(n)} \right)} & =_{def} & \left\{ {{\left. s^{(n)} \middle| s_{j} \right. = s_{k}},{s^{(n)} \in \mathcal{L}^{(n)}}} \right\} \\{\mathcal{I}_{j,k}\left( \mathcal{L}^{(n)} \right)} & = & {\left\{ {\mathcal{I}_{j,k}\left( s^{(n)} \right)} \middle| {s^{(n)} \in \mathcal{L}^{(n)}} \right\}\quad} \\{\mathcal{I}_{j,k}\left( \mathcal{L}^{(n)} \right)}_{1} & \subseteq & {\mathcal{L}^{(n)}\quad}\end{matrix}$

For example, given a language

₁ ⁽³⁾ (where “*” denotes kleen star):

₁ ⁽³⁾ =<a, x, ε><b, y, a>*<ε, z, b>=ab*, xy*z, a*b>,the results of its auto-intersection I_(1,3)(

₁ ⁽³⁾) of tapes 1 and 3 is:I _(1,3)(

₁ ⁽³⁾)={<ab, xyz, ab>}which means the auto-intersection admits a single iteration through thecycle <b,y,a>* (i.e., “*” takes only the value 1).

B.5 Single-Tape Intersection

Single-tape intersection of two multi-tape languages,

₁ ^((n)) and

₂ ^((m)), is based on one single tape j and k from each of their sets ofn and m tapes, respectively, and may be defined as:${\mathcal{L}_{1}^{(n)}\bigcap\limits_{j,k}\mathcal{L}_{2}^{(m)}} =_{def}{{{\overset{\_}{\mathcal{P}}}_{n + k}\left( {\mathcal{I}_{j,{n + k}}\left( {\mathcal{L}_{1}^{(n)} \times \mathcal{L}_{2}^{(m)}} \right)} \right)}.}$

The single-ape intersection operation pairs each string tuple s^((n)) ε

₁ ^((n)) with each string tuple v^((m)) ε

₂ ^((m)) iff s_(j)=v_(k). The resulting language

^((n+m−1)) is defined as:

^((n+m−1)) ={u ^((n+m−1)) |u ^((n+m−1)) ={overscore (P)} _(n+k)(s ^((n)):v ^((m))), s ^((n)) ε

₁ ^((n)) , v ^((m)) ε

₂ ^((m)) , s _(j) =v _(k)}with weight w:w(u ^((n+m−1)))=w(s ^((n))){circle over (×)}w(v ^((m))).

Single-tape intersection can intuitively be understood as a“composition” of the two languages such that tape j of

₁ ^((n)) is intersected with tape k of

₂ ^((m)). Tape k, which due to the intersection becomes equal to tape j,is then removed, and all other tapes of both languages are preservedwithout modification. For example, if one language contains the orderedpair <x,y> and another language contains the ordered pair <y,z>, thencomposing <x,y> and <y,z>, in that order, results in a language thatcontains the ordered pair <x,z>, whereas intersecting <x,y> and <y,z>,at tapes 2 and 1 respectively, results in the language that contains theordered pair <x,y,z>.

Single-tape intersection is neither associative nor commutative, exceptfor special cases. A first special case of single-tape intersection isthe intersection of two acceptors (i.e., 1-tape automata) leading to anacceptor, which may be defined as:${A_{1}^{(1)}\bigcap A_{2}^{(1)}} = {{A_{1}^{(1)}\bigcap\limits_{1,1}A_{2}^{(1)}} = {{\overset{\_}{\mathcal{P}}}_{2}\left( {\mathcal{I}_{1,2}\left( {A_{1}^{(1)} \times A_{2}^{(1)}} \right)} \right)}}$where the first special case of single-tape intersection has thelanguage

and weight w:

(A ₁ ⁽¹⁾ ∩ A ₂ ⁽¹⁾)={s|s ε

(A ₁), s ε

(A ₂)}w(s)=w _(A) ₁ (s){circle over (×)}w _(A) ₂ (s)and where single-tape intersection has the same language:

(A ₁ ⁽¹⁾ ×A ₂ ⁽¹⁾)={<s ₁ , s ₂ >|s ₁ ε

(A ₁), s ₂ ε

(A ₂)}

(I _(1,2)(A ₁ ⁽¹⁾ ×A ₂ ⁽¹⁾))={<s, s>|s ε

(A ₁), s ε

(A ₂)}

({overscore (P)} ₂(I _(1,2)(A ₁ ⁽¹⁾ ×A ₂ ⁽¹⁾)))={s|s ε

(A ₁), s ε

(A ₂)}w(s)=w(<s, s>)=w(<s ₁ , s ₂>)=w _(A) ₁ (s){circle over (×)}w _(A) ₂ (s)

A second special case of single-tape intersection is the composition oftwo transducers (i.e., 2-tape automata) leading to a transducer. Thesecond special case of single-tape intersection requires an additionalcomplementary projection and may be defined as:${A_{1}^{(2)} \diamond A_{2}^{(2)}} = {{{\overset{\_}{\mathcal{P}}}_{2}\left( {A_{1}^{(2)}\bigcap\limits_{2,1}A_{2}^{(2)}} \right)} = {{{\overset{\_}{\mathcal{P}}}_{2,3}\left( {\mathcal{I}_{2,3}\left( {A_{1}^{(2)} \times A_{2}^{(2)}} \right)} \right)}.}}$

FIG. 3 is a flow diagram that sets forth steps for performing asingle-tape intersection operation of a first WMTA and a second WMTA. At302, a cross-product WMTA is computed using the first WMTA and thesecond WMTA. At 304, a string tuple for each path of the cross-productWMTA is generated. At 306 and 316, for each string tuple generated at304, the string of a first selected tape is compared with the string ofa second selected tape at 308. If at 310, the strings compared at 308are equal, then the corresponding string tuple is retained in thecross-product WMTA at 312; otherwise, the corresponding string tuple isrestructured at 314. When strings of the last tuple have been comparedat 316, redundant strings retained in the string tuples at 312 areremoved in the cross-product WMTA at 318.

For example, FIG. 4 presents two WMTAs A₁ ⁽³⁾ (with string tuple <ax,by, cz> and weight w₁{circle over (×)}w₂ and string tuple <ae, bf, cg>and weight w₁{circle over (×)}w₃) and A₂ ⁽²⁾ (with string tuple <aa, cg>and weight w₄{circle over (×)}w₅) for illustrating a simple example ofthe single-tape intersection operation$A_{1}^{(3)}\bigcap\limits_{3,2}A_{2}^{(2)}$(where the final state of each automaton also has a weight). Theresulting cross-product WMTA A₃ ⁽⁵⁾ of the two WMTAs A₁ ⁽³⁾ and A₂ ⁽²⁾results in the following two string tuples (see 302 FIG. 3): <ax, by,cz, aa, cg> having weight w₁{circle over (×)}w₂{circle over(×)}w₄{circle over (×)}w₅ and <ea, bf, cg, aa, cg> having weightw₁{circle over (×)}w₃{circle over (×)}w₄{circle over (×)}w₅. Incomparing tapes 3 and 5 of the cross-product WMTA A₃ ⁽⁵⁾ (see 308 inFIG. 3), the following strings are compared for each string tuple of thecross-product WMTA A₃ ⁽⁵⁾, respectively: “cz” and “cg”; and “cg” and“cg”.

In the example shown in FIG. 4, one set of strings at the selected tapes3 and 2 are equal, which results in the retention of the string tuple<ae, bf, cg, aa, cg> in the cross-product WMTA A₃ ⁽⁵⁾ (see 312 in FIG.3). Also in the example, one set of strings at the selected tapes 3 and5 in the cross-product WMTA A₃ ⁽⁵⁾ are not equal, resulting in therestructuring of the cross-product WMTA to remove the string tuple <ax,by, cz, aa, cg> through auto-intersection as WMTA A₄ ⁽⁵⁾=I_((3,5))(A₃⁽⁵⁾) (see 314 in FIG. 3). Finally, redundant strings in the string tupleretained in the cross-product WMTA A₄ ⁽⁵⁾ are removed throughcomplementary projection A₅ ⁽⁴⁾={overscore (P)}(A₄ ⁽⁵⁾) to result in thesimplified string tuple <ae, bf, cg, aa> for the WMTA A₅ ⁽⁴⁾ (see 318 inFIG. 3).

B.6 Multi-Tape Intersection

Multi-tape intersection of two multi-tape languages,

₁ ^((n)) and

₂ ^((m)), uses r tapes in each language, and intersects them pair-wise.In other words, multi-tape intersection is an operation that involvesintersecting several tapes of a first MTA with the same number of tapesof a second MTA. Multi-tape intersection is a generalization ofsingle-tape intersection, and is defined as:${{\mathfrak{z}}_{1}^{(n)}\bigcap\limits_{\underset{\underset{j_{r},k_{r}}{\cdots}}{j_{1},k_{1}}}{\mathfrak{z}}_{2}^{(m)}} =_{def}\quad{{\overset{\_}{\mathcal{P}}}_{{n + k_{1}},\ldots\quad,{n + k_{r}}}\left( {\mathcal{I}_{j_{r},{n + k_{r}}}\left( {{{\cdots\mathcal{I}}_{j_{1},{n + k_{1}}}\left( {{\mathfrak{z}}_{1}^{(n)} \times {\mathfrak{z}}_{2}^{(m)}} \right)}\cdots} \right)} \right)}$

The multi-tape intersection operation pairs each string tuple s^((n)) ε

₁ ^((n)) with each string tuple v^((m)) ε

₂ ^((m)) iff s_(j) ₁ =v_(k) ₁ until s_(j) _(r) =v_(k) _(r) . Theresulting language

^((n+m−r)) is defined as:

^((n+m−r)) ={u ^((n+m−r)) |u ^((n+m−r)) ={overscore (P)} _(n+k) ₁_(, . . . , n+k) _(r) (s ^((n)) :v ^((m))), s ^((n)) ε

₁ ^((n)) , v ^((m))ε

₂ ^((m)) , s _(j) ₁ =v _(k) ₁ , . . . , s _(j) _(r) =v _(k) _(r) }weight w:w(u ^((n+m−r)))=w(s ^((n))){circle over (×)}w(v ^((m))).

All tapes k_(i) of language

₂ ^((n) ² ⁾ that have directly participated in the intersection areafterwards equal to the tapes j_(i) of

₁ ^((n) ¹ ⁾, and are removed. Multi-tape intersection is neitherassociative nor commutative (except for special cases).

B.7. Transition Automata and Transition-Wise Processing

A transition automaton A(e) is defined herein as an automaton containingonly one single transition e that actually belongs to another automatonA. Any automaton operation is allowed to be performed on A(e), whichmeans that the operation is performed on one single transition of Arather than on A. This can be thought of either as the operation beingperformed in place (i.e., inside A on one single transition) or as thetransition being extracted from A and placed into a new automaton A(e)that participates in an operation whose result is then placed back intoA, at the original location of e.

The concept of transition automata allows a method to be defined wherean automaton A is transition-wise processed (i.e., each of itstransitions is processed independently through a sequence of automatonoperations). In the following example that illustrates the transitionautomata A(e) being transition-wise processed, e is one transition in aset of transitions E of automata A: 1 for ∀e ∈ E do 2 A(e)

A₁ ⁽²⁾ ⋄ A(e) ⋄ A₂ ⁽²⁾ 3 A(e)

... A(e) ...

C. Methods for Performing MTA Operations

This section sets forth methods for performing multi-tape operations forautomata defined in section B, while referring to the variables anddefinitions in Table 1. Note that in Table 1 the following variablesserve for assigning temporarily additional data to a state q: μ[q],v[q], ξ[q], v[q], and χ[q]. TABLE 1 A_(j) = <Σ_(j), Q_(j), i_(j), F_(j),E_(j), κ_(j)> Specific (original) weighted automaton from which a newweighted automaton A is constructed A = <Σ, Q, i, F, E, κ> New weightedautomaton resulting from the construction ν[q] = q₁ State q₁ of anoriginal automaton A₁ assigned to a state q of a new auto- maton A μ[q]= (q₁, q₂) Pair of states (q₁, q₂) of two ordinal automata, A₁ and A₂,assigned to a state q of a new automaton A Ψ[q] = {umlaut over (q)}Previous state {umlaut over (q)} in the new automaton A on the same pathas q (back pointer) Θ[q] = (q₁, q₂, q_(ε)) Triple of states q₁, q₂,q_(ε) belonging to the original automata, A₁ and A₂, and to a simulatedfilter automaton, A_(ε), respectively; assigned to a state q of a newautomaton A ξ[q] = (s, u) Pair of “leftover” substrings (s, u) assignedto a state q of a new auto- maton A lcp(s, s′) Longest common prefix ofthe strings s and s′ l_(j, k, . . .) (x) = Short-hand notation for theprojection P_(j,k, . . .) (l(x)) of the label x δ(s, u) = |s| − |u|Delay between two string (or leftover substrings) s and u, where |s| isthe length of string s and |u| is the length of string u. Example:δ(ξ[q]) χ[q] = (χ₁, χ₂) Pair of integers assigned to a state q,expressing the lengths of two strings s and u on different tape of thesame path ending at q

C.1 Cross-Product

Generally, this section sets forth two alternate embodiments forperforming the cross-product operation defined in section B-3, and morespecifically for compiling the cross product of two WMTAs, A₁ ^((n)) andA₂ ^((m)).

The first embodiment pairs the label of each transition e₁ ε E₁ withε^((m)) (producing l(e₁):ε^((m)), and the label of each transition e₂εE₂ with ε^((n)) (producing ε^((n)): l(e₂)), and finally concatenates A₁^((n+m)) with A₂ ^((n+m)). This operation is referred to herein as“CrossPC (A₁, A₂)” where the suffix “PC” stands for “path concatenation”and can be expressed as:l(π₁^((n)) : π₂^((m))) = (l(e_(1, 1)^((n))) : ɛ^((m)))⋯  (l(e_(1, α)^((n))) : ɛ^((m))) ⋅ (ɛ^((m))) ⋅ (ɛ^((n)) : l(e_(2, 1)^((m))))⋯  (ɛ^((n)) : l(e_(2, β)^((m)))).

The second embodiment pairs each string tuple of A₁ ^((n)) with eachstring tuple of A₂ ^((m)), following the definition in section B-3. Thisembodiment in actuality pairs each path π₁ of A₁ ^((n)) with each pathπ₂ of A₂ ^((m)) transition-wise, and appends epsilon transitions (i.e.,ε-transitions) to the shorter of two paired paths, so that both haveequal length. This operation is referred to herein as “CrossPA (A₁, A₂)”where the suffix “PA” stands for “path alignment” and can be expressedas:${{l\left( {\pi_{1}^{(n)}:\pi_{2}^{(m)}} \right)} = {\left( {{l\left( e_{1,1}^{(\overset{.}{n})} \right)}:{l\left( e_{2,1}^{(m)} \right)}} \right)\cdots\quad{\left( {{l\left( e_{1,\alpha}^{(n)} \right)}:{l\left( e_{2,\alpha}^{(m)} \right)}} \right) \cdot \left( {ɛ^{(n)}:{l\left( e_{2,{\alpha + 1}}^{(m)} \right)}} \right) \cdot \left( {ɛ^{(n)}:{l\left( e_{2,\beta}^{(m)} \right)}} \right)}}}\quad$for α<β, and similarly otherwise.

C.1.1 Conditions

Both embodiments for performing the cross-product operation operate theconditions that:

-   -   (A) the semirings of the two automata A₁ ^((n)) and A₂ ^((m))        are equal to: K₁=K₂; and    -   (B) the common semiring K=K₁=K₂ is commutative (which holds in        the case of CrossPA embodiment only):        ∀w ₁ , w ₂ ε K: w ₁ {circle over (×)}w ₂ =w ₂ {circle over (×)}w        ₁

C.1.2 Path Concatenation Method

A (brute force) method for performing the cross-product operation inaccordance with an embodiment, defined as “CrossPC( )”, with pathconcatenation “PC” may be described as follows in pseudocode: CROSSPC(A₁^((n)), A₂ ^((m))) → A : 1 A

<Σ₁ ∪ Σ₂, Q₁ ∪ Q₂, i₁, F₂, E₁ ∪ E₂, κ₁> 2 for ∀e₁ ∈ E₁ do 3 l(e₁)

l(e₁):ε^((m)) 4 for ∀e₂ ∈ E₂ do 5 l(e₂)

ε^((n)):l(e₂) 6 for ∀q ∈ F₁ do 7 E

E ∪ { <q, ε^((n+m)),

(q), i₂> } 8

(q)

{overscore (0)} 9 return A

The pseudocode set forth above for cross-product path concatenation(i.e., CrossPC( )) starts with a WMTA A that is equipped with the unionof the alphabets (i.e., the union of the state sets of transducers A₁and A₂). The initial state of A equals that of A₁, its set of finalstates equals that of A₂, and its semiring equal those of A₁ and A₂ (seeline 1). First, the labels of all transitions originally coming from A₁are (post-) paired with ε^((m))-transitions, and the labels of alltransitions originally coming from A₂ are (pre-) paired withε^((n))-transitions. Subsequently, all final states of A₁ are connectedwith the initial state of A₂ through ε^((m+m))-transitions. As a result,each string n-tuple of A₁ will be physically followed by each stringm-tuple of A₂. However, logically those string tuples will be pairedsince they are on different tapes.

It will be appreciated by those skilled in the art that the paths of theWMTA A become longer in this embodiment than the other embodiment withpath alignment. In addition, it will be appreciated that each transitionof A in this embodiment is partially labeled with an epsilon, which mayincrease the runtime of subsequent operations performed on A. Further itwill be appreciated that this embodiment may be readily adapted tooperate with non-weighted multi-tape automata (MTAs) by removing theweight Q(q) from line 7 and the semiring K₁ from line 1 and by replacingline 8 with “Final(q)←false”, in the pseudocode above for pathconcatenation (i.e., CrossPC( )).

C.1.3 Path Alignment Method

FIG. 5 sets forth a second embodiment in pseudocode for performing thecross-product operation with path alignment “PA” (i.e., CrossPA( )). InFIG. 5, the final weight of an undefined state q=⊥ is assumed to be{overscore (1)}: Q(⊥)={overscore (1)}. In the embodiment shown in FIG.5, the pseudocode starts with a WMTA A whose alphabet is the union ofthe alphabets of A₁ and A₂, whose semiring equals those of A₁ and A₂,and that is otherwise empty (see line 1). First, the initial state i ofA is created from the initial states A₁ and A₂ (at line 3), and i ispushed onto the stack (at line 4) which was previously initialized (atline 2). While the stack is not empty, the states q are popped from itto access the states q₁ and q₂ that are assigned to q through u[q] (seelines 5 and 6).

Further in the embodiment shown in FIG. 5, if both q₁ and q₂ are defined(i.e., ≠⊥), each outgoing transition e₁ of q₁ is paired with eachoutgoing transition of e₂ of q₂ (see lines 7 to 9). Also (at line 13), atransition in A is created whose label is the pair l(e₁):l(e₂) and whosetarget q′ corresponds to the tuple of targets (n(e₁), n(e₂)). If q′ doesnot exist yet, it is created and pushed onto the stack (see lines 10 to12).

In addition in the embodiment shown in FIG. 5 (at lines 14 and 15), if afinal state q₁ (with Q(q₁)≠{overscore (0)}) in A₁ is encountered, thepath is followed beyond q₁ on an epsilon-transition that exists onlylocally (i.e., virtually) but not physically in A₁. The target of theresulting transition in A corresponds to the tuple of targets (n(e₁),n(e₂)) with n(e₁) being undefined (=⊥) because e₁ does not existphysically (see line 17). If a final state q₂ (with Q(q₂)≠{overscore(0)}) in A₂ is encountered, it is processed similarly (see lines 20 to25). It will be; appreciated by those skilled in the art that thisembodiment may be readily adapted to operate with non-weightedmulti-tape automata (MTAs) by removing the weights from lines 13, 19,and 25, and the semiring K₁ from line 1, and the and by replacing line28 with “Final(q)←Final(q₁)

Final(q₂)”, in the pseudocode shown in FIG. 5.

C.1.4 Complexity

The space complexity of the (brute-force) cross-product path,concatenation embodiment described in section C.1.2 is |Q₁|+|Q₂| (i.e.,on the order of O(n)) and its running complexity is |F₁|. In contrast,the space complexity of the cross-product path alignment embodimentdescribed in section C.1.3 is (|Q₁|+1)·(|Q₂|+1) (i.e., on the order ofO(n²) and its running complexity is (|E₁|+1)·(|E₂|+1).

C.2 Auto-Intersection

This section describes two methods for performing the auto-intersectionoperation defined in section B.4.

C.2.1 Conditions of First Method

The method described for performing auto-intersection operation in thissection operates under the condition that the original automaton A₁^((n)) does not contain cycles labeled only with an epsilon on one andnot only with an epsilon on the other of the two tapes involved in theoperation. If condition occurs the method will stop without producing aresult, rather than attempting to create an infinite number of states.It is assumed this undesirable condition occurs rarely (if at all) innatural language processing applications (see for example FIG. 8).

C.2.2 First Method

FIG. 6 sets forth a first method in pseudocode for performing theauto-intersection operation (i.e., AutoIntersect( )). Line 1 of thepseudocode begins with a WMTA A whose alphabet and semiring equal thoseof A₁ and that is otherwise empty. To each state q that will be createdin A (see line 3), three variables are assigned: (i) v[q]=q₁ thatindicates the corresponding state q₁ in A₁ (see line 24), (ii)Ψ[q]={umlaut over (q)} that indicates the previous state {umlaut over(q)} in A on the current path (back pointer) (see line 25), and (iii)ξ[q]=(s,u) that states the leftover string s of tape j (yet unmatched intape k) and leftover string u of tape k (yet unmatched in tape j) (seeline 26).

At lines 3-4 and 20-27, an initial state i in A is created and pushedonto the stack defined at line 2. As long as the stack is not empty, thestates q are popped from the stack and each of the outgoing transitionse₁εE(q) in A with the same label and weight are followed (see line 5).To compile the leftover strings ξ[q′]=(s′, u′) of its target q′=n(e) inA, the leftover strings ξ[q]=(s,u) of its source q=p(e) are concatenatedwith the j-th and k-th component of its label, l_(j)(e₁) and l_(k)(e₁),and the longest common prefix of the resulting string s·l_(j)(e₁) andu·l_(k)(e₁) is removed (see lines 7 and 16-19).

If both leftover strings s′ and u′ of q′ are non-empty (i.e., ≠ε) thenthey are incompatible and the path that is being followed is invalid. Inthis case, the transition eεE(q) and its target q′ in A are notconstructed. If either s′ or u′ is empty (i.e., =ε), then the currentpath is valid (at least up to this point) (see line 8).

At line 9, a test is made to determine whether the process will notterminate, which is the case if a cycle in A₁ was traversed and theξ[{circumflex over (q)}] at the beginning of the cycle differs from theξ[n(e)]=(s′,u′) at its end. In this case the states {circumflex over(q)} and n(e) are not equivalent, and cannot be represented through onestate in A, although they correspond to the same state q₁ in A₁. Inorder that the process does not traverse the cycle an infinite number oftimes and create a new state on each transversal, the process aborts atline 10.

At line 14 (if the process did not abort at line 10), a transition e inA is constructed. If its target q′=n(e) does not exist yet, it iscreated and pushed onto the stack at lines 11-13. It will be appreciatedby those skilled in the art that this embodiment may be readily adaptedto operate with non-weighted multi-tape automata (MTAs) by removing theweight w(e₁) from line 14 and the semiring K₁ from line 1, and byreplacing line 22 with “Final(q)←Final(q₁)” and line 23 with“Final(q)←false”, in the pseudocode shown in FIG. 6.

C.2.3 Example of First Method

FIG. 7 illustrates an example of the first method for performing theauto-intersection operation shown in FIG. 6. In the example shown inFIG. 7, the language of A₁ ⁽²⁾ is the infinite set of string tuples<ab*¹, a*¹b>. Only one of those tuples, namely <ab, ab>, is in thelanguage of the auto-intersection with A⁽²⁾=I_(1,2)(A₁ ⁽²⁾) because allother tuples contain different strings on tapes 1 and 2. In FIG. 7,weights of each WMTA are omitted and dashed states and transitions arenot constructed.

The method described above in section C.2.2 builds first the initialstate 0 of A⁽²⁾ with v[0]=0, Ψ[0]=⊥, and ξ[0]=(ε,ε). Then the onlyoutgoing transition of the state referenced by v[0] is selected, whichis labeled a:ε, and the leftover strings of its target, state 1, iscompiled by concatenating ξ[0]=(ε,ε) with the label a:ε. This givesfirst (a,ε) and then, after removal of the longest comment prefix (ε inthis case), ξ[1]=(a,ε). State 1 is created because ξ[1] meets theconstrains defined in the method. It is assigned v[1]=1, because itcorresponds to state 1 in A₁, Ψ[1]=0, because it is (at present) reachedfrom state 0 in A, and ξ[1]=(a,ε). State 1 in A is not final (unlikestate 1 in A₁) because ξ[1]≠(ε,ε).

The remaining strings of state 2 in A result from ξ[1]=(a,ε) and thetransition label ε:b, and are ξ[2]=(a,b). State 2 and its incomingtransition are not created because ξ[2] does not meet the definedconstraints. All:other states and transitions are similarly constructed.

FIG. 8 illustrates an example where the first method for performing theauto-intersection operation shown in FIG. 6 fails to construct theauto-intersection whose language, <a*¹a, aa*¹, x*¹yz*¹>, and is actuallynot finite-state (see conditions in section C.2.1). The failurecondition is met at states 2 and 3 of the new automaton A. In FIG. 8,weights of each WMTA are omitted and dashed states and transitions arenot constructed.

C.2.4 Conditions of Second Method

The second method unlike the first method has no conditions. Inaccordance with the second method for performing auto-intersection, thesecond method detects whether the auto-intersection of a given WMTA A₁^((n)) is regular (i.e., whether it can be represented as an automaton).If it is found to be regular, the second method creates a WMTAA=I_(j,k)(A₁ ^((n))); otherwise if it is found not to be regular and thecomplete result cannot be represented by an automaton, the second methodcreates the automaton A ⊂ I_(j,k)(A₁ ^((n))), which is a partial result.

Briefly, the second method performs auto-intersection by assigningleftover-strings to states using variable ξ[q]=(s,u) and makes use ofthe concept of delay using variable δ(s,u), which are both defined inTable 1. The variable χ[q] defines a pair of integers assigned to astate q, which expresses the lengths of two strings s and u on differenttape of the same path in a WMTA ending at q. The concept of delayprovides that given a path in a WMTA, the delay of its states q is thedifference of lengths of the strings on the tapes j and k up to q. It isexpressed as the function δ(s,u)=|s|−|u|, where |s| is the length ofstring s and |u| is the length of string u.

C.2.5 Second Method

FIG. 9 sets forth the second method in pseudocode for performing theauto-intersection operation (i.e., AutoIntersect( )). The method isbased on the observation that if the auto-intersectionA^((n))=I_(j,k)(A₁ ^((n))) of a WMTA A₁ ^((n)) is regular, the delaywill not exceed a limit δ_(max) at any state q of A^((n)). If it is notregular, the delay will exceed any limit, but it is possible toconstruct a regular part, A_(p) ^((n)), of the auto-intersection withinthe limit of δ_(max) and a larger regular part, A_(p2) ^((n)), within alarger limit δ_(max2) (i.e., A_(p) ^((n)) ⊂ A_(p2) ^((n)) ⊂ I_(j,k)(A₁^((n)))).

By way of overview, the second method set forth in FIG. 9 and describedin detail below involves three operations. First, two limits arecomputed corresponding to delays δ_(max) and δ_(max2) of theauto-intersection (i.e., line 1). The first delay δ_(max) is computed bytraversing the input automaton and measuring the delays along all itspaths. The second delay is computed similar to the first delay whiletraversing an additional cycle of the input automaton. Next, theauto-intersection of the automaton is constructed using the delayδ_(max2) (i.e., lines 2-10, where the second limit serves to delimitconstruction of the automaton). Finally, the constructed automaton istested for regularity using the delay δ_(max) (i.e., line 11, where thefirst limit serves to determine whether the auto-intersection isregular).

C.2.5.A Compile Limits

In FIG. 9, a maximal delay, δ_(max), is compiled. The maximal delay,δ_(max), can occur at any point between the two strings l_(j)(π) andl_(k)(π) on any path π of I_(j,k)(A₁ ^((n))) if it is regular. Let R(A₁⁽²⁾)=({<aa, ε>} ∪ {<ε, aaa>})*, encoded by two cycles (as provided atlines 1 and 27-42), where R(A^((n))) is the n-tape relation of A^((n)).To obtain a match between l₁(π) and l₂(π) in A⁽²⁾=I_(1,2)(A₁ ⁽²⁾), thefirst cycle must be traversed three times and the second cycle twotimes, allowing for any permutation: A⁽²⁾=(<aa, ε>³<ε, aaa>²∪<aa, ε>²>ε,aaa>²<aa, ε>¹∪ . . . )*. This illustrates that in a match between anytwo cycles of A₁ ^((n)), the absolute value of the delay does not exceedδ_(cyc)={circumflex over (δ)}_(cyc)·max(1, δ_(cyc)−1), with {circumflexover (δ)}_(cyc) being the maximal absolute value of the delay of anycycle (as provided at line 30). For any other kind of match, thedifference between the maximal and the minimal delay, {circumflex over(δ)}_(max) and {circumflex over (δ)}_(min), encountered at any (cyclicor acyclic) path π of A₁ ^((n)) is taken into account. Therefore, theabsolute value of the delay in A^((n)) does not exceed δ_(max)=max({circumflex over (δ)}_(max)−{circumflex over (δ)}_(min), δ_(cyc)) ifI_(j,k)(A₁ ^((n))) is regular (as provided at line 31). If it isnon-regular, then δ_(max) will limit A to a regular subset of theauto-intersection, A^((n)) ⊂ I_(j,k)(A₁ ^((n))).

Next, a second limit, δ_(max2), is compiled that permits, in case ofnon-regularity, to construct a larger regular subset A^((n)) ⊂I_(j,k)(A₁ ^((n))) than δ_(max) does. Non-regularity can only resultfrom matching cycles in A₁ ^((n)). To obtain a larger subset ofI_(j,k)(A₁ ^((n))), the cycles of A₁ ^((n)) must be unrolled furtheruntil one more match between two cycles is reached. Therefore,δ_(max2)=δ_(max)+δ_(cyc) (as provided at line 32).

C.2.5.B Construct Auto-Intersection

Construction starts with a WMTA A whose alphabet and semiring equalthose of A₁ and that is otherwise empty (as provided at line 2). To eachstate q that will be created in WMTA A, two variables are assigned: (i)v[q]=q₁ indicating the corresponding state q₁ in A₁; and (ii)ξ[q]=(s,u), which sets forth the leftover string s of tape j (yetunmatched in tape k) and the leftover string u of tape k (yet unmatchedin tape j).

Subsequently, an initial state i in A is created and pushed onto a stack(as provided at lines 4 and 17-26). As long as the stack is not empty,states q are taken from it and each of the outgoing transitions e₁εE(q₁)of the corresponding state q₁=v[q] in A₁ are followed (as provided inlines 5 and 6). A transition e₁ in A₁ is represented as eεE(q) in A,with the same label and weight. To compile the leftover stringsξ[q′]=(s′, u′) of its target q′=n(e) in A, the leftover strings ξ[q]=(s,u) of its source q=p(e) are concatenated with the j-th and k-thcomponent of its label, l_(j)(e₁) and l_(k)(e₁), and the longest commonprefix of the resulting strings s·l_(j)(e₁) and u·l_(k)(e₁) is removed(as provided in lines 7 and 13-16).

If both leftover strings s′ and u′ of q′ are non-empty (i.e., ≠ε) thenthey are incompatible and the path that is being followed is invalid. Ifeither s′ or u′ is empty ε (i.e., =ε) then the current path is valid (atleast up to this point) (as provided in line 8). Only in this case andonly if the delay between s′ and u′ does not exceed δ_(max2), atransition e in A is constructed corresponding to e₁ in A₁ (as providedin line 10). If its target q′=n(e) does not exist yet, it is created andpushed onto the stack (as provided in lines 9 and 17-26). The infiniteunrolling of cycles is prevented by δ_(max2).

C.2.5.C Test Regularity of Auto-Intersection

Finally, the constructed auto-intersection of the WMTA A is tested forregularity. From the above discussion of δ_(max) and δ_(max2) it followsthat if I_(j,k)(A₁ ^((n))) is regular then none of the states that areboth reachable and coreachable have |δ(ξ[q])|>δ_(max). Further, ifI_(j,k)(A₁ ^((n))) is non-regular then a WMTA A^((n)) built having aδ_(max2), is bigger then a WMTA A^((n)) built having δ_(max), and hencehas states with |δ(ξ[q])|>δ_(max) that are both reachable andcoreachable. Since all states of A^((n)) are reachable, due to themanner in which the WMTA A is constructed, it is sufficient to test fortheir coreachability (as provided at line 11) to know whether the WMTA Ais regular (i.e., the delay of WMTA A^((n)) will not exceed a limit δmax at any state q of A^((n))).

C.2.6 Examples of Second Method

FIGS. 10 and 11 each present two different automata for illustrating themethod for performing the auto-intersection operation set forth in FIG.9 which results in a WMTA A^((n)) that is regular. In contrast, FIG. 12presents two automata for illustrating an example of the method forperforming the auto-intersection operation set forth in FIG. 9 whichresults in a WMTA A^((n)) that is not regular. In the FIGS. 10-12, thedashed portions are not construstructed by the method set forth in FIG.9, and the states q in the Figures identified with the arrow “

” have delay such that |δ(ξ[q])|>δ_(max).

FIG. 10 illustrates the WMTA A₁ ⁽²⁾ and its auto-intersection A⁽²⁾. InFIG. 10, the WMTA A₁ ⁽²⁾ is the infinite set of string tuples {<ab^(k),a^(k)b>|kεN} (where N is the set of natural numbers). Only one of thosetuples, namely <ab, ab>, is in the relation of the auto-intersectionA⁽²⁾=I_(1,2)(A₁ ⁽²⁾) because all other tuples contain different stringson tape 1 and tape 2. In accordance with the method for performing theauto-intersection operation set forth in FIG. 9, the following foursteps are performed (namely, at (1) the limits are computed, at (2)-(3)the auto-intersection is computed, and at (3) the auto-intersection istested for regularity): $\begin{matrix}{\delta_{\max} = {\delta_{\max\quad 2} = 1}} & (1) \\{{\mathcal{R}\left( A_{1}^{(2)} \right)} = \left\{ {\left\langle {{ab}^{k},{a^{k}b}} \right\rangle ❘{k \in N}} \right\}} & (2) \\{{\mathcal{I}_{1,2}\left( {\mathcal{R}\left( A_{1}^{(2)} \right)} \right)} = {{\mathcal{R}\left( A^{(2)} \right)} = \left\{ \left\langle {{ab}^{1},{a^{1}b}} \right\rangle \right\}}} & (3) \\ & (4)\end{matrix}$

FIG. 11 illustrates the WMTA A₁ ⁽³⁾ and its auto-intersectionA⁽³⁾=I_(1,2)(A₁ ⁽³⁾), where the state 3 of A⁽²⁾ in FIG. 11 identifiedwith the arrow “

” has a delay such that |δ(ξ[q])|>δ_(max). In accordance with the methodfor performing the auto-intersection operation set forth in FIG. 9, thefollowing four steps are performed (namely, at (1)-(2) the limits arecomputed, at (3)-(4) the auto-intersection is computed, and at (5) theauto-intersection is tested for regularity): $\begin{matrix}{\delta_{\max} = 2} & (1) \\{\delta_{\max\quad 2} = 3} & (2) \\{{\mathcal{R}\left( A_{1}^{(3)} \right)} = \left\{ {\left\langle {a^{k},a,{x^{k}y}} \right\rangle ❘{k \in N}} \right\}} & (3) \\{{\mathcal{I}_{1,2}\left( {\mathcal{R}\left( A_{1}^{(3)} \right)} \right)} = {{\mathcal{R}\left( A^{(3)} \right)} = \left\{ \left\langle {a^{1},a,{x^{1}y}} \right\rangle \right\}}} & (4) \\{{q} \in \quad{Q:\left. {{{\delta\left( {\xi\lbrack q\rbrack} \right)}} > {{\delta_{\max}\bigwedge{coreachable}}\quad(q)}}\rightarrow{regular} \right.}} & (5)\end{matrix}$

FIG. 12 illustrates the WMTA A₁ ⁽³⁾ and its auto-intersectionA⁽³⁾=I_(1,2)(A₁ ⁽³⁾), where the states 3, 8, and 11 of A⁽²⁾ in FIG. 12identified with the arrow “

” have a delay such that |δ(ξ[q])|>δ_(max). In accordance with themethod for performing the auto-intersection operation set forth in FIG.9, the following four steps are performed (namely, at (1)-(2) the limitsare computed, at (3)-(5) the auto-intersection is computed, and at (6)the auto-intersection is tested for regularity): $\begin{matrix}{\delta_{\max} = 2} & (1) \\{\delta_{\max\quad 2} = 3} & (2) \\{{\mathcal{R}\left( A_{1}^{(3)} \right)} = \left\{ {{\left\langle {{a^{k}a},{aa}^{h},{x^{k}{yz}^{h}}} \right\rangle ❘k},{h \in N}} \right\}} & (3) \\{{\mathcal{I}_{1,2}\left( {\mathcal{R}\left( A_{1}^{(3)} \right)} \right)} = \left\{ {\left\langle {{a^{k}a},{aa}^{h},{x^{k}{yz}^{k}}} \right\rangle ❘{k \in N}} \right\}} & (4) \\{{{\mathcal{I}_{1,2}\left( {\mathcal{R}\left( A_{1}^{(3)} \right)} \right)} \supset {\mathcal{R}\left( A^{(3)} \right)}} = \left\{ {\left\langle {{a^{k}a},{aa}^{k},{x^{k}{yz}^{k}}} \right\rangle ❘{k \in \left\lbrack {0,3} \right\rbrack}} \right\}} & (5) \\{\exists{q \in {Q:\left. {{{\delta\left( {\xi\lbrack q\rbrack} \right)}} > {{\delta_{\max}\bigwedge{coreachable}}\quad(q)}}\rightarrow{{non}\text{-}{regular}} \right.}}} & (6)\end{matrix}$

C.3 Single-Tape Intersection

This section sets forth two alternate embodiments for performing, in onestep, the single-tape intersection operation of two WMTAs A₁ ^((n)) andA₂ ^((n)) defined in section B.5 above. Instead of first building thecross-product, A₁ ^((n))×A₂ ^((m)), and then deleting most of its pathsby the auto-intersection, I_(j,n+k)( ) operation defined in section B.4,both embodiments construct only the useful part of the cross-product.However, it will be understood by those skilled in the art that analternate embodiment follows the definition and performs single-tapeintersection in multiple steps.

The first embodiment is a method for performing single-tape intersectionsimilar to known methods for performing composition, and is not adaptedto handle WMTAs that contain epsilons on the intersected tapes, j and k.The first embodiment is referred to herein as IntersectCross (A₁, A₂,j,k), which is defined as: $\begin{matrix}{{{IntersectCross}\left( {A_{1},A_{2},j,k} \right)} = {\mathcal{I}_{j,{n + k}}\left( {A_{1}^{(n)} \times A_{2}^{(m)}} \right)}} \\{{A_{1}^{(n)}\bigcap\limits_{j,k}A_{2}^{(m)}} = {{\overset{\_}{\mathcal{P}}}_{n + k}\left( {{IntersectCross}\left( {A_{1},A_{2},j,k} \right)} \right)}}\end{matrix}$

It will be appreciated by those skilled in the art that thecomplementary projection, {overscore (P)}_(n+k)( ), operation defined insection B.2 may be integrated into the first embodiment in order toavoid an additional pass. However, it is kept separate fromIntersectCross( ) because IntersectCross( ) may serve also as a buildingblock of another operation where the complementary projection operationmust be postponed (e.g., multi-tape intersection).

The second embodiment simulates the behavior of an epsilon-filtertransducer for composing transducers with epsilon-transitions that isdisclosed by Mohri, Pereira, and Riley in “A rational design for aweighted finite-state transducers”, published in Lecture Notes inComputer Science, 1436:144-158, 1998, which is incorporated herein byreference and hereinafter referred to as “Mohri's Epsilon-FilterPublication”. The second embodiment is referred to herein asIntersectCrossEps(A₁, A₂, j, k), where the suffix “Eps” expresses itssuitability for WMTAs with epsilons on the intersected tapes, j and k.

Similar to the first embodiment, it will be appreciated by those skilledin the art that the complementary projection, {overscore (P)}_(n+k)( ),operation defined in section B.2 may be integrated into the secondembodiment in order to avoid an additional pass. However, it is keptseparate from lntersectCrossEps( ) because IntersectCrossEps( ) mayserve also as a building block of other operations where thecomplementary projection operation must be postponed or extended (e.g.,classical composition and multi-tape intersection).

C.3.1 Conditions

Both embodiments for performing the single-tape intersection operationoperate under the conditions that:

-   -   (A) the semirings of the two automata A₁ ^((n) ¹ ⁾ and A₂ ^((n)        ² ⁾ equal to: K₁=K₂;    -   (B) the common semiring K=K₁=K₂ is commutative:        ∀w ₁ , w ₂ ε K: w ₁ {circle over (×)}w ₂ =w ₂ {circle over (×)}w        ₁;    -   (C) For lntersectCross( ): Neither of the two intersected tapes        contains ε:        (        ∃e ₁ ε E ₁ : l _(j)(e ₁)=ε)Λ(        ∃e ₂ ε E ₂ : l _(k)(e ₂)=ε).

C.3.2 First Embodiment

FIG. 13 sets forth a method in pseudocode of the first embodiment forperforming the single-tape intersection operation (i.e.,A=IntersectCross(A₁, A₂, j, k)). Line 1 begins with a WMTA A whosesemiring equals those of A₁ and A₂ and that is otherwise empty. At lines3 and 12-18, the initial state i of the WMTA A is created from theinitial states A₁ and A₂ and pushed onto the stack initialized at line2. While the stack is not empty, the states q are popped from it and thestates q₁ and q₂ are accessed that are assigned to q through μ[q] atlines 4 and 5.

At lines 6 and 7, each outgoing transition e₁ of q₁ is intersected withoutgoing transitions e₂ of q₂. This succeeds at line 8 only if the j-thlabeled component of e₁ equals the k-th labeled component of e₂, where jand k are the two intersected tapes of A₁ and A₂, respectively. Only ifit succeeds at line 8 will a transition be created at line 10 for Awhose label results from pairing l(e₁) with l(e₂) and whose target q′corresponds with the pair of targets (n(e₁), n(e₂)). If q′ does notexist yet, it is created and pushed onto the stack at lines 9 and 12-18.

It will be appreciated by those skilled in the art that this embodimentmay be readily adapted to operate with non-weighted multi-tape automata(MTAs) by removing the weights from line 10 and the semiring K₁ fromline 1, and by replacing line 15 with “Final(q)←Final(q₁)ΛFinal(q₂)” inthe pseudocode shown in FIG. 13.

C.3.3 Mohri's Epsilon-Filter

The second embodiment for performing the single-tape intersectionoperation makes use of an epsilon-filter transducer similar to its usein the composition of two transducers with an epsilon-transition that isdescribed in by Mohri et al. in Mohri's Epsilon-Filter Publication. Thissection describes the use of Mohri's epsilon-filter by simulating itsuse.

FIG. 14 illustrates Mohri's epsilon-filter A_(ε) and two transducers A₁and A₂. The transducers A₁ and A₂ are pre-processed for filteredcomposition (where x=

{φ₁, φ₂, ε₁, ε₂}) and each ε in tape 2 of A₁ ⁽²⁾ is replaced by an ε₁and each ε in tape 1 of A₂ ⁽²⁾ by an ε₂. In addition, a loopingtransition labeled with ε:φ₁ is added to each state of A₁ ⁽²⁾, and aloop labeled with φ₂:ε to each state of A₂ ⁽²⁾. The pre-processedtransducers are then composed with the filter A_(ε) ⁽²⁾ in between: A₁ ⋄A_(ε) ⋄ A₂. The filter controls how epsilon-transitions are composedalong each pair of paths in A₁ and A₂, respectively. As long as thereare equal symbols (ε or not) on the two paths, they are composed witheach other, and the state does not change in A_(ε) from state zero. If asequence of ε in A₁ but not in A₂ is encountered, the state advances inA₁, and the state does not change in A₂ from the state shown and inA_(ε) from state 1. If a sequence of ε in A₂ but not in A₁ isencountered, the state advances in A₂, and the state does not change inA₁ from the state shown and in A_(ε) from state 2.

C.3.4 Second Embodiment

FIG. 15 sets forth a method in pseudocode of the second embodiment forperforming the single-tape intersection operation (i.e.,A=IntersectCrossEps(A₁, A₂, j, k)). Unlike the embodiment set forth insection C.3.2 which builds the cross-product, A₁×A₂, and then deletessome of its paths by auto intersection, I_(j,n+k)( ), this embodimentsimulates the behavior of Mohri's epsilon-filter transducer described insection C.3.3 without Mohri's epsilon-filter transducer being present byadding an attribute at each state of the resulting WMTA A, therebyallowing only the useful parts of the cross-product to be constructed.

More specifically, in FIG. 15, line 1 begins with a WMTA A whosealphabet is the union of the alphabets of transducers A₁ and A₂, whosesemiring equals those of A₁ and A₂, and that is otherwise empty. Atlines 3 and 20-26, the initial state i of A is created from the statesof A₁, A₂, and A_(ε), and the initial state i is pushed onto the stackinitialized at line 2. While the stack is not empty, states q are poppedfrom it and the states q₁, q₂, and q_(ε) that are assigned to states qthrough V[q] at lines 4-5.

At lines 6-7, each outgoing transition e₁ of state q₁ is intersectedwith each outgoing transition e₂ of state q₂. This succeeds at line 8only if the j-th labeled component of e₁ equals the k-th labeledcomponent of e₂, where j and k are the two intersected tapes of A₁ andA₂, respectively, and if the corresponding transition in A_(ε) hastarget zero. Only if it succeeds at line 8 then at line 10, a transitionin A is created (from the current source state) whose label results frompairing l(e₁) with l(e₂) and whose target state q′ corresponds with thetriple of targets (n(e₁), n(e₂), 0). If state q′ does not exist yet, itis created and pushed onto the stack at lines 20-26.

Subsequently, all epsilon-transitions are handled in A₁ at lines 11-14and in A₂ at lines 15-18. If an epsilon is encountered in A₁ and theautomaton A_(ε) is in state 0 or in state 1, the current state advancesin A₁, does not change in A₂, and advances to state 1 in A_(ε). At lines11-14, a transition in A is therefore created whose target correspondsto the triple (n(e₁), q₂, 1). Corresponding actions take place at lines15-18 if an epsilon is encountered in A₂.

It will be appreciated by those skilled in the art that this embodimentmay be readily adapted to operate with non-weighted multi-tape automata(MTAs) by removing the weights from lines 10, 14, and 18 and thesemiring K₁ from line 1, and by replacing line 23 with“Final(q)←Final(q₁)ΛFinal(q₂)” in the pseudocode shown in FIG. 15.

It will further be appreciated by those skilled in the art that theembodiment shown in FIG. 15 for performing single-tape intersection oftransducers A₁ and A₂ may be readily adapted to permit the compositionof a first tape of the automata A₁ and a second tape of the automata A₂while retaining one of the first and the second tapes and all othertapes of both transducers at the transitions where the tapes intersectby revising the labels at each transition E created at lines 10, 14, and18 to remove one component (i.e., removing either the first tape or thesecond tape to eliminate redundant paths through complementaryprojection).

In addition it will be appreciated by those skilled in the art that theembodiment set forth in FIG. 15 may be readily adapted to perform aclassical-composition operation where both the first tape of theautomata A₁ and the second tape of the automata A₂ are removed whileretaining all other tapes of both transducers at the transitions wherethe tapes intersect by revising the labels at each transition E createdat lines 10, 14, and 18 to remove both components (i.e., removing boththe first tape and the second tape).

C.3.5 Complexity

The worst-case complexity of both the first embodiment and the secondembodiment for carrying out the a single-tape intersection is |E₁|·|E₂|in space and runtime (i.e., on the order of O(n²)). The complexity ofthe second embodiment is only linearly greater than that of the first,which may be significant for large WMTAs.

C.4 Multi-Tape Intersection

This section sets forth two embodiments for performing the multi-tapeintersection operation of two WMTAs A₁ ^((n)) and A₂ ^((m)) defined insection B.6. The first embodiment is referred to herein as Intersect1(A₁^((n)) and A₂ ^((m)), j₁ . . . j_(r), k₁ . . . k_(r)), follows thedefinition of multi-tape intersection while performing operations forcross-product, auto-intersection, and complementary projection. Thesecond embodiment, which is more efficient, is referred to herein asIntersect2(A₁ ^((n)) and A₂ ^((m)), j₁ . . . j_(r), k₁ . . . k_(r)),makes use of methods that perform cross-product and auto-intersection inone step (for intersection tape j₁ and k₁), and then theauto-intersection (for any intersecting tapes j_(i) with k_(i), fori>1).

C.4.1 Conditions

Both embodiments for performing the multi-tape intersection operationoperate under the conditions that:

-   -   (a) the semirings of the two automata A₁ ^((n)) and A₂ ^((m))        are equal to: K₁=K₂; and    -   (b) the common semiring K=K₁=K₂ is commutative:        ∀w ₁ , w ₂ ε K: w ₁ {circle over (×)}w ₂ =w ₂ {circle over (×)}w        ₁    -   (c) A₁ ^((n)) and A₂ ^((m)) have no cycles exclusively labeled        with epsilon on any of the intersected tapes:        j _(i)ε[[1,n]] and k _(i)ε[[1,m]], for i ε[[1,r]].

C.4.2 Embodiments

The first embodiment, Intersect1( ), for carrying out the multi-tapeintersection operation defined in section B.6 may be expressed inpseudocode as follows:

(a) in one embodiment, while referring at line 1 to CrossPA( ) (toperform a cross product operation on the WMTAs A₁ ^((n)) and A₂ ^((m)))set forth in FIG. 5, at line 3 to the first AutoIntersect( ) method (toperform the auto-intersection operation for each intersecting tapesj_(i) and k_(i)) set forth in FIG. 6, and at line 4 to complementaryprojection {overscore (P)} (to eliminate redundant tapes) described insection B.2: INTERSECT1(A₁ ^((n)), A₂ ^((m)), j₁ . . . j_(r), k₁ . . .k_(r)) → A : 1 A

CROSSPA(A₁ ^((n)), A₂ ^((m))) 2 for ∀i ∈

1, r

do 3 A

AUTOINTERSECT(A, j_(i), n + k_(i)) 4 A

{overscore (P)}_(n+k) ₁ , ... ,n+k _(r) (A) 5 return A

or (b) in another embodiment, while referring at line 1 to CrossPA( )(to perform a cross product operation on the WMTAs A₁ ^((n)) and A₂^((m))) set forth in FIG. 5, at line 4 to the second AutoIntersect( )method (to perform the auto-intersection operation for each intersectingtapes j_(i) and k_(i)) set forth in FIG. 9, and at line 6 tocomplementary projection {overscore (P)} (to eliminate redundant tapes)described in section B.2: INTERSECT1(A₁ ^((n)), A₂ ^((m)), j₁ . . .j_(r), k₁ . . . k_(r)) → (A, boolean) : 1 A

CROSSPA(A₁ ^((n)), A₂ ^((m))) 2 regular

true 3 for ∀i ∈

1, r

do 4 (A, reg)

AUTOINTERSECT(A, j_(i), n + k_(i)) 5 regular

regular

reg 6 A

{overscore (P)}_(n+k) ₁ , ... ,n+k _(r) (A) 7 return (A, regular)

The second embodiment, Intersect2( ), for carrying out the multi-tapeintersection operation defined in section B.6 may be expressed inpseudocode as follows:

(a) in one embodiment, while referring at line 1 to IntersectCrossEps( )set forth in FIG. 15 (to perform the single-tape intersection operationon one pair of tapes, for tapes j₁ with k₁ on the WMTAs A₁ ^((n)) and A₂^((m))), at line 3 to AutoIntersect( ) set forth in FIG. 6 (to performthe auto-intersection operation for any intersecting tapes j_(i) withk_(i), for i>1), and at line 4 to complementary projection {overscore(P)} (to eliminate redundant tapes) described in section B.2:INTERSECT2(A₁ ^((n)), A₂ ^((m)), j₁ . . . j_(r), k₁ . . . k_(r)) → A : 1A

INTERSECTCROSSEPS(A₁ ^((n)), A₂ ^((m)), j₁, k₁) 2 for ∀i ∈

2, r

do 3 A

AUTOINTERSECT(A, j_(i), n + k_(i)) 4 A

{overscore (P)}_(n+k) ₁ , ... ,n+k _(r) (A) 5 return A

or (b) in another embodiment, while referring at line 1 tolntersectCrossEps( ) set forth in FIG. 15 (to perform the single-tapeintersection operation on one pair of tapes j_(i=1) with k_(i=1) on theWMTAs A₁ ^((n)) and A₂ ^((m))), at line 4 to AutoIntersect( ) set forthin FIG. 9 (to perform the auto-intersection operation for anyintersecting tapes j_(i) with k_(i), for i>1), and at line 6 tocomplementary projection {overscore (P)} (to eliminate redundant tapes)described in section B.2: INTERSECT2(A₁ ^((n)), A₂ ^((m)), j₁ . . .j_(r), k₁ . . . k_(r)) → (A, boolean) : 1 A

INTERSECTCROSSEPS(A₁ ^((n)), A₂ ^((m)), j₁, k₁) 2 regular

true 3 for ∀i ∈

2, r

do 4 (A, reg)

AUTOINTERSECT(A, j_(i), n + k_(i)) 5 regular

regular

reg 6 A

{overscore (P)}_(n+k) ₁ , ... ,n+k _(r) (A) 7 return (A, regular)

C.4.3 Example

By way of example, a solution is presented for compiling a multi-tapeintersection of the MTA A₁ ⁽²⁾ and the MTA A₂ ⁽²⁾ in accordance with thelast method presented immediately above to produce the regular MTA A⁽²⁾as follows:$A^{(2)} = {{A_{1}^{(2)}\bigcap\limits_{\substack{1,1 \\ 2,2}}A_{2}^{(2)}} = {{\overset{\_}{\mathcal{P}}}_{3,4}\left( {\mathcal{I}_{2,4}\left( {{{\mathcal{I}_{1,3}\left( {A_{1}^{(2)} \times A_{2}^{(2)}} \right)}\quad{where}\quad A_{1}^{(2)}}\bigcap\limits_{\substack{1,1 \\ 2,2}}A_{2}^{(2)}} \right.} \right.}}$is given by: ${\begin{matrix}a & b \\ɛ & A\end{matrix}\begin{pmatrix}c & a & b \\B & ɛ & C\end{pmatrix}^{*}\begin{matrix}ɛ & ɛ & ɛ & c & ɛ \\A & B & C & ɛ & A\end{matrix}}\underset{2,2}{\bigcap\limits_{1,1}}{\begin{matrix}ɛ \\A\end{matrix}\begin{pmatrix}a & b & ɛ & c \\B & ɛ & C & A\end{pmatrix}^{*}}$where the preceding each row of the matrix-like representation setsforth a tape of each WMTA (e.g., A₁ ⁽²⁾=<a, ε><b, A>(<c, B><a, ε><b,C>)*<ε, A><ε, B><ε, C><c, ε><ε, A>).

In accordance with the method set forth above, the multi-tapeintersection operation is performed in three steps. First, the followingautomaton B₁ ⁽⁴⁾=I_(1,3)(A₁ ⁽²⁾×A₂ ⁽²⁾) is computed using single-tapeintersection at line 1 (i.e., IntersectCrossEps( ) shown in FIG. 15) toobtain: $\begin{matrix}ɛ & a & b \\ɛ & ɛ & A \\ɛ & a & b \\A & B & ɛ\end{matrix}\begin{pmatrix}ɛ & c & a & b \\ɛ & B & ɛ & C \\ɛ & c & a & b \\C & A & B & ɛ\end{pmatrix}^{*}\begin{matrix}ɛ & ɛ & ɛ & c & ɛ \\A & B & C & ɛ & A \\ɛ & ɛ & ɛ & c & ɛ \\C & ɛ & ɛ & A & ɛ\end{matrix}$

Next, the following automaton B₂ ⁽⁴⁾=I_(2,4)(B₁ ⁽⁴⁾) is computed usingauto-intersection at line 4 (i.e., AutoIntersect( ) shown in FIG. 9) toobtain: $\begin{matrix}ɛ & a & b \\ɛ & ɛ & A \\ɛ & a & b \\A & B & ɛ\end{matrix}\begin{pmatrix}ɛ & c & a & b \\ɛ & B & ɛ & C \\ɛ & c & a & b \\C & A & B & ɛ\end{pmatrix}^{1}\begin{matrix}ɛ & ɛ & ɛ & c & ɛ \\A & B & C & ɛ & A \\ɛ & ɛ & ɛ & c & ɛ \\C & ɛ & ɛ & A & ɛ\end{matrix}$

Finally, the following automaton A²={overscore (P)}_(3,4)(B₂ ⁽⁴⁾) iscomputed using complementary projection {overscore (P)} at line 6(described in section B.2) to obtain: $\begin{matrix}ɛ & a & b \\ɛ & ɛ & A\end{matrix}\begin{pmatrix}ɛ & c & a & b \\ɛ & B & ɛ & C\end{pmatrix}^{1}\begin{matrix}ɛ & ɛ & ɛ & c & ɛ \\A & B & C & ɛ & A\end{matrix}$

D. Applications

This section describes different applications for using the operationson WMTAs that are described herein.

D.1 General Use

FIG. 16 sets forth an automaton W⁽¹⁾ for illustrating a proposedoperation for part-of-speech (POS) disambiguation and how it may be usedin natural language processing (NLP). The automaton W⁽¹⁾ shown in FIG.16 is an automaton that represents a natural-language sentence, with oneword w on each transition. In general, such an automaton can contain oneor several paths, according to whether the sentence was unambiguouslytokenized or not. In the example shown in FIG. 16, the automaton W⁽¹⁾contains two paths, corresponding to an ambiguous tokenization intoeither three or four words. FIG. 17 illustrates the process for one pathπ_(j)(W) (which occurs similarly for all paths) of the automaton W⁽¹⁾.

Table 2 sets forth resources that may be used in this example, which areencoded as WMTAs. TABLE 2 Type Of Automaton Resource Tapes N⁽²⁾normalizer original word form, normalized form L⁽³⁾ morphologicalsurface form, lexicon lemma, POS-tag H⁽¹⁾ Hidden POS-tag sequence MarkovModel

This example can be expressed in pseudocode as follows: 1 for ∀e ∈ E(W)do 2 A(e)⁽¹⁾

P₂( A(e)⁽¹⁾

N⁽²⁾ ) 3 A(e)⁽³⁾

A(e)⁽¹⁾

L⁽³⁾ 4 if A(e)⁽³⁾ = ⊥ 5 then A(e)

. . . 6 W⁽⁴⁾

W⁽⁴⁾

H⁽¹⁾ 7 W⁽⁴⁾

bestPath(W⁽⁴⁾)

At line 1 of the pseudocode above, each transition of 1-tape automatonW⁽¹⁾ i.e., each word of the sentence) is processed separately: A(e),representing the transition e ε E(W), is intersected with a normalizerN⁽²⁾ and only the second tape of the result is retained at line 2.Subsequently, A(e) is intersected with a 3-tape morphological lexicon

⁽³⁾ and becomes itself 3-tape as set forth in line 3 of the pseudocodeand in FIG. 18. Specifically, FIG. 18 illustrates the intersection of anarc e of W⁽¹⁾ with a path π⁽³⁾ of the lexicon automaton

⁽³⁾. If at line 4 this operation fails, then something else is done withA(e) at line 5 such as intersecting A(e) with another n-tape automaton.

When all transitions have been separately processed, as described above,processing begins again on the sentence automaton W⁽⁴⁾, which at thispoint has four tapes due to previous operations. At line 6, the sentenceautomaton W⁽⁴⁾ is intersected with a 1-tape automaton H⁽¹⁾ thatrepresents an HMM as shown in FIG. 19. More specifically, FIG. 19illustrates the intersection of a path π_(j) ⁽⁴⁾ of the sentenceautomaton W⁽⁴⁾ with a path π⁽¹⁾ of the HMM automaton H⁽¹⁾. Finally atline 7, only the best path of the sentence automaton W⁽⁴⁾ is retained.

D.2 Building a Lexicon from a Corpus

Using basic software programming utilities such as those forming part ofUNIX, a list of inflected words with their POS-tags and theirfrequencies from an annotated corpus may be generated and stored in afile. Such a file may for example contain entries as shown in Table 3.TABLE 3 Inflected POS- Word tag Fequency leave NN 7 leave VB 154 leavesNNS 18 leaves VBZ 25 leaving VBG 67 left JJ 47 left VBD 118 left VBN 147

In one implementation, a WMTA lexicon may be created over the semiring<{overscore (IR)}⁺, +, ×, 0, 1>, where each line of a file (asillustrated in Table 3) becomes the n-tape label of a path, such thateach token of the line (or each column of Table 3) is placed onto adifferent tape, except for the last token (i.e., the frequency) whichbecomes the weight of the path. Using such an implementation and theexample data in Table 3, a 2-tape corpus lexicon C⁽²⁾ may be constructedwith paths having the labels l⁽²⁾ and the weights w as shown in Table 4.TABLE 4 Label l⁽²⁾ Weight w <leave, NN> 7 <leave, VB> 154 <leaves, NNS>18 <leaves, VBZ> 25 <leaving, VBG> 67 <left, JJ> 47 <left, VBD> 118<left, VBN> 147

D.3 Enhancing a Lexicon with Lemmas

The corpus of lexicons and lemmas defined in section D.2 may be enhancedusing another lexicon encoded as a non-weighted MTA, T⁽³⁾ having entriesof the form T⁽³⁾<InflectedWord, Lemma, PosTag>, and containing at leastthose entries in Table 5. TABLE 5 Inflected Word Lemma PosTag leaveleave NN leave leave VB leaves leaf NNS leaves leave NNS leaves leaveVBZ leaving leave VBG left left JJ left leave VBD left leave VBN

Each of the entries in Table 5 is uniformly weighted with w=1.0 byassigning 1.0 to each transition and final state. To distribute theprobability among different lemmas for equal inflected forms and POStags, the lexicon T⁽³⁾ is normalized with respect to tape 1 and tape 3using multi-tape intersection resulting in the entries shown in Table 6.TABLE 6 Lexicon T⁽³⁾ Weight w <leave, leave, NN> 1.0 <leave, leave, VB>1.0 <leaves, leaf, NNS> 0.5 <leaves, leave, NNS> 0.5 <leaves, leave,VBZ> 1.0 <leaving, leave, VBG> 1.0 <left, left, JJ> 1.0 <left, leave,VBD> 1.0 <left, leave, VBN> 1.0

The intersection of the lexicon T⁽³⁾ (having values shown in column 1 ofTable 6) with the corpus lexicon C⁽²⁾ (having values shown in Table 4)on the tapes of inflected forms and POS-tags, respectively, is definedas: $L^{(3)} = {T^{(3)}\underset{3,2}{\bigcap\limits_{1,1}}C^{(2)}}$

and results in the lexicon L⁽³⁾ the entries of which are shown in Table7. TABLE 7 Lexicon L⁽³⁾ Weight w <leave, leave, NN> 7 <leave, leave, VB>154 <leaves, leaf, NNS> 9 <leaves, leave, NNS> 9 <leaves, leave, VBZ> 25<leaving, leave, VBG> 67 <lett, left, JJ> 47 <left, leave, VBD> 118<left, leave, VBN> 147

Finally, the lexicon L⁽³⁾ (having values shown in Table 7) is normalized(i.e., all entries with the same inflected form have a weight w thatsums to one) with respect to inflected forms to provide the entriesshown in Table 8. TABLE 8 Lexicon L⁽³⁾ Weight w <leave, leave, NN> 0.043<leave, leave, VB> 0.957 <leaves, leaf, NNS> 0.209 <leaves, leave, NNS>0.209 <leaves, leave, VBZ> 0.581 <leaving, leave, VBG> 1.000 <left,left, JJ> 0.151 <left, leave, VBD> 0.378 <left, leave, VBN> 0.471

D.4 Normalizing a Lexicon

In this example, the conditional normalization of a WMTA, A₁ ^((n+m)),is compiled over the semiring <{overscore (IR)}⁺, +, ×, 0, 1>, withrespect to some of its tapes, using a generalization of the methodproposed by Eisner in “Parameter Estimation For ProbabilisticFinite-state Transducers”, in Proceedings of the 40th Annual Meeting,pages 1-8, Philadelphia, Pa., USA, Association For Computer Linguistics,2002, which is incorporated herein by reference. Consider, for instance,each string tuple u^((n+m)) in the language

₁ ^((n+m)) of A₁ ^((n+m)) to be a pair of string tuples (i.e.,u^((n+m))=s^((n)): v^((m))). Normalizing A₁ ^((n+m)) conditionally withrespect to n+1 to n+m means compiling for each u^((n+m)) the probabilityof s^((n)) in the context of v^((m)).

Suppose, originally the weight of each u^((n+m)) is its frequency (i.e.,the frequency of concurrence of tuples s^((n)) and v^((m))) is given by:w _(A) ₁ (u ^((n+m)))=f(s ^((n)) :v ^((m)))

The context tapes of z,900 ₁ ^((n+m)) are projected and the followingcontext language is obtained:

_(c) ^((m)) =P _(n+1, . . . , n+m)(

₁ ^((n+m)))

Each string tuple v^((m)) ε

_(c) ^((m)) of the determinized context automaton A_(c) ^((m)) has theweight given by:${w\left( v^{(m)} \right)} = {{\underset{v_{i}^{(m)} = v^{(m)}}{\oplus}{w_{A_{1}}\left( {s_{i}^{(n)}:v_{i}^{(m)}} \right)}} = {f\left( v^{(m)} \right)}}$

The weight of all string tuples v^((m)) ε

_(c) ^((m)) is inversed by inversing the weight of each transition andeach final state of A_(c) ^((m)). The resulting weight of each v^((m))is given by:w _(A) _(c) (v ^((m)))=f(v ^((m)))⁻¹

Finally, the original A₁ ^((n+m)) is intersected with A_(c) ^((m)) onall context tapes, and an automaton A₂ ^((n+m)) obtained with thelanguage given by:$\mathcal{L}_{2}^{({n + m})} = {\mathcal{L}_{1}^{({n + m})}\underset{{n + m},m}{\underset{\ldots}{\bigcap\limits_{{n + 1},1}}}\mathcal{L}_{c}^{(m)}}$

All string tuples u^((n+m))=s^((n)): v^((m)) ε

₂ ^((n+m)) have the weight given by: $\begin{matrix}{{w_{A_{2}}\left( u^{({n + m})} \right)} = {{w_{A_{1}}\left( {s^{(n)}:v^{(m)}} \right)} \otimes {w_{A_{c}}\left( v^{(m)} \right)}}} \\{= {{f\left( {s^{(n)}:v^{(m)}} \right)} \cdot {f\left( v^{(m)} \right)}^{- 1}}} \\{= {p\left( s^{(n)} \middle| v^{(m)} \right)}}\end{matrix}$

Those skilled in the art will appreciate that this approach does notrequire the context tapes to be consecutive.

To compile the joint normalization of a WMTA, A₁ ^((n)), over thesemiring <{overscore (IR)}⁺, +, ×, 0, 1>, the frequency of each stringtuple s(n)ε

(A₁ ^((n))) is multiplied with the inverse of the total frequency f_(T)of all string tuples. The total frequency f_(T) can be obtained byreplacing the label of each transition of A₁ ^((n)) with ε and applyingan epsilon removal (as described for example by Mohri in “Genericepsilon-removal and input epsilon-normalization algorithms for weightedtransducers”, in International Journal of Foundations of ComputerScience, 13(1):129-143, 2002, which is incorporated herein byreference). The resulting automaton, A_(T), has one single state,without any outgoing transitions, and a final weight that equals thetotal frequency, Q=f_(T). The final weight Q is inversed and A₁ ^((n))is concatenated with A_(T). All string tuples s^((n)) ε

(A₂ ^((n))) of the resulting automaton A₂ ^((n))=A₁ ^((n))A

have the weight given by: $\begin{matrix}{{w_{A_{2}}\left( s^{(n)} \right)} = {{w_{A_{1}}\left( s^{(n)} \right)} \otimes {w_{A_{T}}\left( ɛ^{(n)} \right)}}} \\{= {{f\left( s^{(n)} \right)} \cdot f_{T}^{- 1}}} \\{= {p\left( s^{(n)} \right)}}\end{matrix}$

D.5 Using a Lexicon

When using a lexicon WMTA, A(n), the following may be specified: r inputtapes, j₁ to j_(r), and x output tapes, k₁ to k_(x), which do not haveto be consecutive. A weighted r-tuple of input strings, s^((r)), mayfirst be converted into an input WMTA, I^((r)), having one single pathwith the specified label, s^((r)), and weight, w(s^((r))). Subsequentlyto obtaining the output WMTA, O^((x)), whose language contains allweighted x-tuples of output strings, v^((x)), the following multi-tapeintersection and projection may be used:$O^{(x)} = {\mathcal{P}_{k_{1},\ldots,k_{x}}\left( {A^{(n)}\bigcap\limits_{\underset{\underset{j_{r},r}{\cdots}}{j_{1},1}}I^{(r)}} \right)}$

D.6 Searching for Similarities

In the applications described in this section, string tuples s(n) maysearched for in the language

₁ ^((n)) of a WMTA A₁ ^((n)) whose strings s_(j) ₁ to s_(j) _(r) aresimilar to its strings s_(k) ₁ to s_(k) _(r) , respectively. Thecomparison of each pair of tapes, j_(i) and k_(i), may be doneindependently form all other pairs of tapes. Hence the task may bereduced to comparing two tapes, j and k.

First, a 2-tape automaton, R⁽²⁾, may be created, whose language,

_(R) ⁽²⁾, describes the requested relation between tape j and k of

₁ ^((n)). To obtain the language given by the relation:

₂ ^((n)) ⊂

₁ ^((n))and where when this relation holds between the languages, the followinglanguage may be compiled:$\mathcal{L}_{2}^{(n)} = {\mathcal{L}_{1}^{(n)}\bigcap\limits_{\underset{k,2}{j,1}}R^{(2)}}$

In one specific example, suppose there exists an English-Germandictionary that is encoded as a 3-tape WMTA, with each entry being ofthe form <English, German, PosTag>. To find all words that are similarin the two languages, while having the same POS tag, a 2-tape automaton,R⁽²⁾, may be created that describes this similarity by using, forexample, either of the approaches referred to herein as the “LevenshteinDistance” or the “List of Grapheme Correspondences” described below.

The Levenshtein Distance between two strings is the minimal number ofintersections, deletions or substitutions of symbols that are needed totransform one string into the other, as disclosed by Schulz and Mihov in“Fast string correction with Levenshtein automata”, InternationalJournal on Document Analysis and Recognition, 5(1):67-85, 2002, which isincorporated herein by reference. A WMTA, Lv_(d) ⁽²⁾, having aLevenshtein Distance d between the members s₁ and s₂ of all stringtuples s⁽²⁾ ε

(Lv_(d) ⁽²⁾) can be compiled from the regular expression:Lv _(d) ⁽²⁾=((?:i?)*(?:? ∪ ?:ε ∪ ε:?)(?:i?)*)^(d)where ? means any symbol (i.e., ? ε {a, b, c, . . . }), and :i is anidentity paring such that (? :i ?) ε {a:a, b:b, c:c, . . . }, whereas(?:?) ε {a:a, a:b, b:a, . . . }.

The List of Grapheme Correspondences by manually writing a list ofsynchronic grapheme correspondences resulting from historicalphonological alterations in the English and German languages may, forexample, contain the entries in Table 9 (for English in column 1 andGerman in column 2). TABLE 9 ENGLISH GERMAN th d th ss d t

Using the lexicon construction method described in sections D.2-D.5, thelist of entries in Table 9 may be used to construct a lexicon-likeautomaton, Ch⁽²⁾, that encodes these changes in the form set forth inTable 10 (where column 1 identifies the lexicon, and column 2 identifiesthe weight). TABLE 10 Lexicon Ch⁽²⁾ Weight w <th, d> 1 <th, ss> 1 <d, t>1

The lexicon-like automaton Ch⁽²⁾ may then be used to obtain a WMTA,R⁽²⁾, whose language describes the relation between any English word andits potential German form that is given by:R ⁽²⁾=(Ch ⁽²⁾ ∪ ?:i?)⁺

D.7 Preserving Intermediate Transduction Results

In this section, the advantage of WMTAs through transduction cascades,which are frequently used in language and speech processing, isillustrated. In a (classical) weighted transduction cascade, T₁ ⁽²⁾ . .. T_(r) ⁽²⁾, a set of weighted strings, encoded as a weighted acceptor,L₀ ⁽¹⁾, is composed with the transducer, T₁ ⁽²⁾, on its input tape asshown in FIG. 20. The output projection of this composition is the firstintermediate result, L₁ ⁽¹⁾, of the cascade. It is further composed withthe second transducer, T₂ ⁽²⁾, which leads to the second intermediateresult, L₂ ⁽¹⁾, etc. The projection of the last transducer is the finalresult, L_(r) ⁽¹⁾, which is defined as follows:L _(i) ⁽¹⁾ =P ₂(L _(i−1) ⁽¹⁾ ⋄T _(i) ⁽²⁾) for i ε[[1,r]]At any point in this cascade, previous results cannot be accessed.

In a weighted transduction cascade, A₁ ^((n) ¹ ⁾ . . . A_(r) ^((n) ^(r)⁾, that uses WMTAs and multi-tape intersection, intermediate results canbe preserved and used by all subsequent transductions. For example,assuming the two previous results at each point in the cascade (exceptin the first transduction) are to be used in computing the results. Thisoperation requires all intermediate results, L₁ ⁽²⁾, to have two tapesas shown in FIG. 2 and defined as follows: $\begin{matrix}{L_{1}^{(2)} = {L_{0}^{(1)}\bigcap\limits_{1,1}A_{1}^{(2)}}} \\{L_{i}^{(2)} = {{{\mathcal{P}_{2,3}\left( {L_{i - 1}^{(2)}\bigcap\limits_{\underset{2,2}{1,1}}A_{i}^{(3)}} \right)}\quad{for}\quad i} \in {〚{2,{r - 1}}〛}}} \\{{L_{r}^{(2)} = {\mathcal{P}_{3}\left( {L_{i - 1}^{(2)}\bigcap\limits_{\underset{2,2}{1,1}}A_{r}^{(3)}} \right)}}\quad}\end{matrix}$

This augmented descriptive power is also available if the whole cascadeis intersected into a single WMTA, A⁽²⁾ (although A⁽²⁾ has only twotapes in the example). Each of the “incorporated” multi-tapesub-relations in A⁽²⁾ (except for the first one) will still refer to itstwo predecessors as follows: $\begin{matrix}{{A_{1\ldots\quad i}^{(3)} = {{{\mathcal{P}_{1,{n - 1},n}\left( {A_{{1\ldots\quad i} - 1}^{(m)}\bigcap\limits_{\underset{n,2}{{n - 1},1}}A_{i}^{(3)}} \right)}\quad{for}\quad i} \in {〚{2,r}〛}}},{m \in \left\{ {2,3} \right\}}} \\{A^{(2)} = {\mathcal{P}_{1,n}\left( A_{1\ldots\quad r} \right)}}\end{matrix}.$

Advantageously, this example illustrates how intermediate results can bepreserved in transduction cascades sot that they can be accessed by anyof the following transductions.

D.8 Example System

FIG. 22 illustrates a general purpose computer system 1610 for carryingout NLP in accordance with the present invention. The system 1610includes hardware 1612 and software 1614. The hardware 1612 is made upof a processor (i.e., CPU) 1616, memory 1618 (ROM, RAM, etc.),persistent storage 1620 (e.g., CD-ROM, hard drive, floppy drive, tapedrive, etc.), user I/O 1622, and network I/O 1624. The user I/O 1622 caninclude a keyboard 1626, a pointing device 1628 (e.g., pointing stick,mouse, etc.), microphone 1608, camera 1604, speakers 1606, and thedisplay 1630. The network I/O 1624 may for example be coupled to anetwork 1632 such as the Internet. The software 1614 of the system 1610includes an operating system 1636, a regular expression compiler 1638, aWMTAs and MTAs methods 1640 (e.g., auto-intersection andtape-intersection), and NLP methods and applications 1642. In oneembodiment, the natural language processing methods and applications1642 use WMTAs and MTAs that are stored in memory 1618 and that arecompiled, for example, from regular expressions using compiler 1638, toperform one or more singly or in combination of POS tagging,tokenization, phonological and morphological analysis, disambiguation,spelling correction, translation, entity extraction, and shallowparsing.

E. Miscellaneous

Although the present invention is generally directed at weightedautomata, each operation described in the forgoing specification may beused to operate on non-weighted automata as well. Additional backgroundof the invention is described in the following publications incorporatedherein by reference: Andre Kempe, Christof Baeijs, Tamas Gaal, FranckGuingne, Florent Nicart, “WFSC—A new weighted finite state compiler”,8th Int. Conf. on Implementation and Application of Automata (CIAA 03),Santa Barbara, Calif., USA, Jul. 16-18, 2003 (which describes an exampleframework for carrying out the weighted finite state operations setforth herein); and Andre Kempe, “NLP Applications based on weightedmulti tape automata”, TALN, Fes, Morocco, Apr. 19-22, 2004 (whichdescribes an additional example related to section D.6 for extractingsimilar words in French and Spanish).

Advantageously over (weighted) 1-tape or 2-tape system for processingautomata, the forgoing specification for processing n-tape automatapermits: (a) the separation of different types of information used inNLP over different tapes (e.g., surface form, lemma, POS-tag,domain-specific information, etc.); (b) the preservation of some or allintermediate results of various NLP steps on different tapes; and (c)the possibility of defining and implementing contextual replace rulesreferring to different types of information on different tapes.Contextual replace rules are more fully described in the publications,which are incorporated herein by reference, by: Kaplan and Kay “Regularmodels of phonological rule systems”, Computational Linguistics20(3):331-378, 1994; Karttunen “The replace operator”, Proceedings ofthe 33rd Annual Meeting, Cambridge, Mass., USA, Association forComputational Linguistics, pages 16-23, 1995; and Kempe and Karttunen,“Parallel replacement in finite-state calculus”, Proceedings of the 16thInternational Conference on Computational Linguistics (CoLing), volume2, pages 622-627, Copenhagen, Denmark, ACL, 1996.

In addition, it will be appreciated by those skilled in the art that theforging specification provides the following advantages that aredifficult to obtain using 1-tape or 2-tape automata: (a) a simplifieddescription, implementation, modification, and maintenance of a sequenceof NLP tasks (as described for example in section D.1); and (b) theability to execute a sequence of NLP tasks in a single tool via anend-user interface enabled for performing the operations on automatadescribed herein.

Using the foregoing specification, the invention may be implemented as amachine (or system), process (or method), or article of manufacture byusing standard programming and/or engineering techniques to produceprogramming software, firmware, hardware, or any combination thereof. Itwill be appreciated by those skilled in the art that the flow diagramsdescribed in the specification are meant to provide an understanding ofdifferent possible embodiments of the invention. As such, alternativeordering of the steps, performing one or more steps in parallel, and/orperforming additional or fewer steps may be done in alternativeembodiments of the invention.

Any resulting program(s), having computer-readable program code, may beembodied within one or more computer-usable media such as memory devicesor transmitting devices, thereby making a computer program product orarticle of manufacture according to the invention. As such, the terms“article of manufacture” and “computer program product” as used hereinare intended to encompass a computer program existent (permanently,temporarily, or transitorily) on any computer-usable medium such as onany memory device or in any transmitting device.

Executing program code directly from one medium, storing program codeonto a medium, copying the code from one medium to another medium,transmitting the code using a transmitting device, or other equivalentacts may involve the use of a memory or transmitting device which onlyembodies program code transitorily as a preliminary or final step inmaking, using, or selling the invention.

Memory devices include, but are not limited to, fixed (hard) diskdrives, floppy disks (or diskettes), optical disks, magnetic tape,semiconductor memories such as RAM, ROM, Proms, etc. Transmittingdevices include, but are not limited to, the Internet, intranets,electronic bulletin board and message/note exchanges, telephone/modembased network communication, hard-wired/cabled communication network,cellular communication, radio wave communication, satellitecommunication, and other stationary or mobile networksystems/communication links.

A machine embodying the invention may involve one or more processingsystems including, but not limited to, CPU, memory/storage devices,communication links, communication/transmitting devices, servers, I/Odevices, or any subcomponents or individual parts of one or moreprocessing systems, including software, firmware, hardware, or anycombination or subcombination thereof, which embody the invention as setforth in the claims.

The claims, as originally presented and as they may be amended,encompass variations, alternatives, modifications, improvements,equivalents, and substantial equivalents of the embodiments andteachings disclosed herein, including those that are presentlyunforeseen or unappreciated, and that, for example, may arise fromapplicants/patentees and others.

1. In a system for processing natural language, a method forintersecting a first selected tape and a second selected tape of aselected path of a multi-tape automaton (MTA) having a plurality of ntapes and a plurality of paths, comprising: (a) generating a stringtuple <s₁, . . . , s_(n)> having a string s for each of the n tapes ofthe selected path of the MTA; (b) comparing the string s_(j) of thefirst selected tape with the string s_(k) of the second selected tape inthe string tuple generated at (a); (c) if the strings s_(j) and s_(k)equal at (b), retaining the string tuple in the MTA; (d) if the stringss_(j) and s_(k) do not equal at (b), restructuring the MTA to remove thestring tuple while retaining other paths in the MTA; (e) repeating(a)-(d) for at least another path of the MTA by specifying the one ormore paths in turn as the selected path; (f) outputting the MTArestructured at (d) in which all paths in the MTA are removed except forthose having equal strings on the first selected tape and the secondselected tape of the string tuples generated at (a).
 2. The methodaccording to claim 1, wherein each tape of the MTA comprises elements ofat least two natural languages.
 3. The method according to claim 1,wherein the MTA at (a) is the cross product of two input multi-tapeautomata.
 4. The method according to claim 3, further comprisingrepeating (a)-(f) for at least one or both of a third selected tape anda fourth selected tape of the MTA, wherein no more than one is from aset consisting of the first selected tape and the second selected tapeof the MTA.
 5. The method according to claim 3, further comprisingremoving redundant strings in the string tuples of the restructured MTA.6. The method according to claim 1, wherein the MTA has weightedtransitions.
 7. The method according to claim 6, wherein the weights ofthe transitions of the MTA are probabilities.
 8. The method according toclaim 1, wherein restructuring the MTA to remove the string tuple at (d)further comprises removing a transition of the MTA.
 9. The methodaccording to claim 1, wherein the MTA has one or more cycles.
 10. Themethod according to claim 1, wherein (a)-(f) is performed to carry outone or more of morphological analysis, part-of-speech tagging,disambiguation, and entity extraction.
 11. The method according to claim1, wherein (a)-(f) is performed by computing a delay between the stringsof the two tapes.
 12. The method according to claim 11, furthercomprising using the computed delay to determine whether the MTA isregular.
 13. The method according to claim 12, wherein the MTA isregular if the computed delay does not exceed a predetermined limit atany state of the MTA.
 14. The method according to claim 12, wherein ifthe MTA is determined to be regular then the MTA output at (f) is acomplete solution, otherwise the MTA output at (f) is a partialsolution.
 15. In a system for processing natural language, a method forintersecting a first tape and a second tape of an input multi-tapeautomaton (MTA) having a plurality of tapes and a plurality of paths,comprising: (a) computing a first limit and a second limit of the inputMTA; (b) constructing an output MTA that intersects the first tape andthe second tape using the second limit to delimit its construction; saidconstructing removing transitions along paths in the output MTA exceptfor those transitions of paths having similar labels on the firstselected tape and the second selected tape; (c) determining if theoutput MTA is regular using the first limit; (d) if the output MTA isdetermined to be regular at (c), the output MTA being provided as acomplete solution to the intersection of the first tape and the secondtape of the input MTA; (e) if the output MTA is determined not to beregular at (c), the output MTA being provided as a partial solution tothe intersection of the first tape and the second tape of the input MTA.16. The method according to claim 15, further comprising computing thefirst limit by compiling a first maximum delay by traversing the inputMTA automaton and measuring delays on all its paths.
 17. The methodaccording to claim 16, further comprising computing the second limit bycompiling a second maximum delay similar to the first delay whiletraversing an additional cycle of the input MTA.
 18. The methodaccording to claim 15, wherein the MTA is the cross product of a firstMTA and a second MTA.
 19. The method according to claim 18, furthercomprising repeating (a)-(e) for at least one or both of a thirdselected tape and a fourth selected tape of the MTA, wherein no morethan one is from a set consisting of the first selected tape and thesecond selected tape of the MTA.
 20. The method according to claim 15,wherein the MTA has weighted transitions.
 21. The method according toclaim 15, wherein (a)-(e) is performed to carry out one or more ofmorphological analysis, part-of-speech tagging, disambiguation, andentity extraction.
 22. In a system for processing natural language, amethod for intersecting a first tape and a second tape of a multi-tapeautomaton (MTA) having a plurality of n tapes and a plurality of paths,comprising: (a) generating a string tuple <s₁, . . . , s_(n)> having astring s for each of the n tapes of each path of the MTA; (b) comparingthe string s_(j) of the first tape with the string s_(k) of the secondtape in the string tuple; (c) if the strings s_(j) and s_(k) equal at(b), retaining the string tuple in the MTA; (d) if the strings s_(j) ands_(k) do not equal at (b), restructuring the MTA to remove the stringtuple.
 23. The method according to claim 22, further comprising removingredundant strings in the string tuples of the MTA.